Sample records for multiple regression statistical

  1. Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results

    ERIC Educational Resources Information Center

    Warne, Russell T.

    2011-01-01

    Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…

  2. Advanced Statistics for Exotic Animal Practitioners.

    PubMed

    Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G

    2017-09-01

    Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. General Nature of Multicollinearity in Multiple Regression Analysis.

    ERIC Educational Resources Information Center

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  4. RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

    DTIC Science & Technology

    This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)

  5. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    ERIC Educational Resources Information Center

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  6. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  7. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    EPA Science Inventory

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  8. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  9. Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

    ERIC Educational Resources Information Center

    Warner, Rebecca M.

    2007-01-01

    This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…

  10. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    EPA Science Inventory

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  11. Analysis and Interpretation of Findings Using Multiple Regression Techniques

    ERIC Educational Resources Information Center

    Hoyt, William T.; Leierer, Stephen; Millington, Michael J.

    2006-01-01

    Multiple regression and correlation (MRC) methods form a flexible family of statistical techniques that can address a wide variety of different types of research questions of interest to rehabilitation professionals. In this article, we review basic concepts and terms, with an emphasis on interpretation of findings relevant to research questions…

  12. Multiple Regression: A Leisurely Primer.

    ERIC Educational Resources Information Center

    Daniel, Larry G.; Onwuegbuzie, Anthony J.

    Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…

  13. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    ERIC Educational Resources Information Center

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  14. Assistive Technologies for Second-Year Statistics Students Who Are Blind

    ERIC Educational Resources Information Center

    Erhardt, Robert J.; Shuman, Michael P.

    2015-01-01

    At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…

  15. An improved multiple linear regression and data analysis computer program package

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  16. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    ERIC Educational Resources Information Center

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  17. Conjoint Analysis: A Study of the Effects of Using Person Variables.

    ERIC Educational Resources Information Center

    Fraas, John W.; Newman, Isadore

    Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…

  18. Statistical experiments using the multiple regression research for prediction of proper hardness in areas of phosphorus cast-iron brake shoes manufacturing

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.

    2018-01-01

    Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.

  19. Multiple linear regression analysis

    NASA Technical Reports Server (NTRS)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  20. A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity

    ERIC Educational Resources Information Center

    Martin, David

    2008-01-01

    This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…

  1. Predicting Final GPA of Graduate School Students: Comparing Artificial Neural Networking and Simultaneous Multiple Regression

    ERIC Educational Resources Information Center

    Anderson, Joan L.

    2006-01-01

    Data from graduate student applications at a large Western university were used to determine which factors were the best predictors of success in graduate school, as defined by cumulative graduate grade point average. Two statistical models were employed and compared: artificial neural networking and simultaneous multiple regression. Both models…

  2. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    PubMed

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  3. Regression Commonality Analysis: A Technique for Quantitative Theory Building

    ERIC Educational Resources Information Center

    Nimon, Kim; Reio, Thomas G., Jr.

    2011-01-01

    When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…

  4. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  5. Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity

    PubMed Central

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655

  6. Tools to support interpreting multiple regression in the face of multicollinearity.

    PubMed

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  7. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  8. Introductory Statistics in the Garden

    ERIC Educational Resources Information Center

    Wagaman, John C.

    2017-01-01

    This article describes four semesters of introductory statistics courses that incorporate service learning and gardening into the curriculum with applications of the binomial distribution, least squares regression and hypothesis testing. The activities span multiple semesters and are iterative in nature.

  9. A Quantile Regression Approach to Understanding the Relations Between Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    PubMed Central

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2015-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773

  10. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  11. Using regression equations built from summary data in the psychological assessment of the individual case: extension to multiple regression.

    PubMed

    Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J

    2012-12-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.

  12. A Statistical Multimodel Ensemble Approach to Improving Long-Range Forecasting in Pakistan

    DTIC Science & Technology

    2012-03-01

    Impact of global warming on monsoon variability in Pakistan. J. Anim. Pl. Sci., 21, no. 1, 107–110. Gillies, S., T. Murphree, and D. Meyer, 2012...are generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The...generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The predictands are

  13. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  14. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  15. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  16. Predicting recreational water quality advisories: A comparison of statistical methods

    USGS Publications Warehouse

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  17. Transfer Student Success: Educationally Purposeful Activities Predictive of Undergraduate GPA

    ERIC Educational Resources Information Center

    Fauria, Renee M.; Fuller, Matthew B.

    2015-01-01

    Researchers evaluated the effects of Educationally Purposeful Activities (EPAs) on transfer and nontransfer students' cumulative GPAs. Hierarchical, linear, and multiple regression models yielded seven statistically significant educationally purposeful items that influenced undergraduate student GPAs. Statistically significant positive EPAs for…

  18. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  19. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  20. A Quantile Regression Approach to Understanding the Relations Among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students.

    PubMed

    Tighe, Elizabeth L; Schatschneider, Christopher

    2016-07-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82%-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. © Hammill Institute on Disabilities 2014.

  1. Use of Thematic Mapper for water quality assessment

    NASA Technical Reports Server (NTRS)

    Horn, E. M.; Morrissey, L. A.

    1984-01-01

    The evaluation of simulated TM data obtained on an ER-2 aircraft at twenty-five predesignated sample sites for mapping water quality factors such as conductivity, pH, suspended solids, turbidity, temperature, and depth, is discussed. Using a multiple regression for the seven TM bands, an equation is developed for the suspended solids. TM bands 1, 2, 3, 4, and 6 are used with logarithm conductivity in a multiple regression. The assessment of regression equations for a high coefficient of determination (R-squared) and statistical significance is considered. Confidence intervals about the mean regression point are calculated in order to assess the robustness of the regressions used for mapping conductivity, turbidity, and suspended solids, and by regressing random subsamples of sites and comparing the resultant range of R-squared, cross validation is conducted.

  2. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  3. Applications of statistics to medical science, II overview of statistical procedures for general use.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.

  4. Statistical Tutorial | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data.  ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018.  The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.

  5. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

    NASA Astrophysics Data System (ADS)

    Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.

    2013-10-01

    In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.

  6. Statistical Prediction in Proprietary Rehabilitation.

    ERIC Educational Resources Information Center

    Johnson, Kurt L.; And Others

    1987-01-01

    Applied statistical methods to predict case expenditures for low back pain rehabilitation cases in proprietary rehabilitation. Extracted predictor variables from case records of 175 workers compensation claimants with some degree of permanent disability due to back injury. Performed several multiple regression analyses resulting in a formula that…

  7. Advances in Testing the Statistical Significance of Mediation Effects

    ERIC Educational Resources Information Center

    Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

    2006-01-01

    P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…

  8. Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Camilleri, Liberato; Cefai, Carmel

    2013-01-01

    Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…

  9. A Constrained Linear Estimator for Multiple Regression

    ERIC Educational Resources Information Center

    Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

    2010-01-01

    "Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…

  10. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  11. STATLIB: NSWC Library of Statistical Programs and Subroutines

    DTIC Science & Technology

    1989-08-01

    Uncorrelated Weighted Polynomial Regression 41 .WEPORC Correlated Weighted Polynomial Regression 45 MROP Multiple Regression Using Orthogonal Polynomials ...could not and should not be con- NSWC TR 89-97 verted to the new general purpose computer (the current CDC 995). Some were designed tu compute...personal computers. They are referred to as SPSSPC+, BMDPC, and SASPC and in general are less comprehensive than their mainframe counterparts. The basic

  12. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    PubMed

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  13. Using Multilevel Modeling in Language Assessment Research: A Conceptual Introduction

    ERIC Educational Resources Information Center

    Barkaoui, Khaled

    2013-01-01

    This article critiques traditional single-level statistical approaches (e.g., multiple regression analysis) to examining relationships between language test scores and variables in the assessment setting. It highlights the conceptual, methodological, and statistical problems associated with these techniques in dealing with multilevel or nested…

  14. Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo

    NASA Astrophysics Data System (ADS)

    Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik

    2016-03-01

    A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.

  15. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    PubMed

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

    2016-05-01

    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.

  16. Multiple regression for physiological data analysis: the problem of multicollinearity.

    PubMed

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  17. MULTIVARIATE STATISTICAL MODELS FOR EFFECTS OF PM AND COPOLLUTANTS IN A DAILY TIME SERIES EPIDEMIOLOGY STUDY

    EPA Science Inventory

    Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...

  18. Using Artificial Neural Networks in Educational Research: Some Comparisons with Linear Statistical Models.

    ERIC Educational Resources Information Center

    Everson, Howard T.; And Others

    This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…

  19. Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

    PubMed Central

    Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

    2015-01-01

    Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

  20. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    ERIC Educational Resources Information Center

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  1. 77 FR 13691 - Qualification of Drivers; Exemption Applications; Vision

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-07

    ..., ocular hypertension, retinal detachment, cataracts and corneal scaring. In most cases, their eye... Application of Multiple Regression Analysis of a Poisson Process,'' Journal of American Statistical...

  2. Methods for estimating the magnitude and frequency of peak streamflows at ungaged sites in and near the Oklahoma Panhandle

    USGS Publications Warehouse

    Smith, S. Jerrod; Lewis, Jason M.; Graves, Grant M.

    2015-09-28

    Generalized-least-squares multiple-linear regression analysis was used to formulate regression relations between peak-streamflow frequency statistics and basin characteristics. Contributing drainage area was the only basin characteristic determined to be statistically significant for all percentage of annual exceedance probabilities and was the only basin characteristic used in regional regression equations for estimating peak-streamflow frequency statistics on unregulated streams in and near the Oklahoma Panhandle. The regression model pseudo-coefficient of determination, converted to percent, for the Oklahoma Panhandle regional regression equations ranged from about 38 to 63 percent. The standard errors of prediction and the standard model errors for the Oklahoma Panhandle regional regression equations ranged from about 84 to 148 percent and from about 76 to 138 percent, respectively. These errors were comparable to those reported for regional peak-streamflow frequency regression equations for the High Plains areas of Texas and Colorado. The root mean square errors for the Oklahoma Panhandle regional regression equations (ranging from 3,170 to 92,000 cubic feet per second) were less than the root mean square errors for the Oklahoma statewide regression equations (ranging from 18,900 to 412,000 cubic feet per second); therefore, the Oklahoma Panhandle regional regression equations produce more accurate peak-streamflow statistic estimates for the irrigated period of record in the Oklahoma Panhandle than do the Oklahoma statewide regression equations. The regression equations developed in this report are applicable to streams that are not substantially affected by regulation, impoundment, or surface-water withdrawals. These regression equations are intended for use for stream sites with contributing drainage areas less than or equal to about 2,060 square miles, the maximum value for the independent variable used in the regression analysis.

  3. Content and Method in the Teaching of Marketing Research Revisited

    ERIC Educational Resources Information Center

    Wilson, Holt; Neeley, Concha; Niedzwiecki, Kelly

    2009-01-01

    This paper presents the findings from a survey of marketing research faculty. The study finds SPSS is the most used statistical software, that cross tabulation, single, independent, and dependent t-tests, and ANOVA are among the most important statistical tools according to respondents. Bivariate and multiple regression are also considered…

  4. A New Sample Size Formula for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…

  5. Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

    PubMed

    Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi

    2007-10-01

    Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.

  6. Primary Factors Related to Multiple Placements for Children in Out-of-Home Care

    ERIC Educational Resources Information Center

    Eggertsen, Lars

    2008-01-01

    Using an ecological framework, this study identified which factors related to out-of-home placements significantly influenced multiple placements for children in Utah during 2000, 2001, and 2002. Multinomial logistic regression statistical procedures and a geographical information system (GIS) were used to analyze the data. The final model…

  7. Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…

  8. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    PubMed

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  9. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  10. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  11. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE PAGES

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

    2017-05-15

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  12. Trends in Mortality After Primary Cytoreductive Surgery for Ovarian Cancer: A Systematic Review and Metaregression of Randomized Clinical Trials and Observational Studies.

    PubMed

    Di Donato, Violante; Kontopantelis, Evangelos; Aletti, Giovanni; Casorelli, Assunta; Piacenti, Ilaria; Bogani, Giorgio; Lecce, Francesca; Benedetti Panici, Pierluigi

    2017-06-01

    Primary cytoreductive surgery (PDS) followed by platinum-based chemotherapy is the cornerstone of treatment and the absence of residual tumor after PDS is universally considered the most important prognostic factor. The aim of the present analysis was to evaluate trend and predictors of 30-day mortality in patients undergoing primary cytoreduction for ovarian cancer. Literature was searched for records reporting 30-day mortality after PDS. All cohorts were rated for quality. Simple and multiple Poisson regression models were used to quantify the association between 30-day mortality and the following: overall or severe complications, proportion of patients with stage IV disease, median age, year of publication, and weighted surgical complexity index. Using the multiple regression model, we calculated the risk of perioperative mortality at different levels for statistically significant covariates of interest. Simple regression identified median age and proportion of patients with stage IV disease as statistically significant predictors of 30-day mortality. When included in the multiple Poisson regression model, both remained statistically significant, with an incidence rate ratio of 1.087 for median age and 1.017 for stage IV disease. Disease stage was a strong predictor, with the risk estimated to increase from 2.8% (95% confidence interval 2.02-3.66) for stage III to 16.1% (95% confidence interval 6.18-25.93) for stage IV, for a cohort with a median age of 65 years. Metaregression demonstrated that increased age and advanced clinical stage were independently associated with an increased risk of mortality, and the combined effects of both factors greatly increased the risk.

  13. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  14. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  15. A Quantile Regression Approach to Understanding the Relations among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    ERIC Educational Resources Information Center

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2016-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological…

  16. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  17. Which Variables Associated with Data-Driven Instruction Are Believed to Best Predict Urban Student Achievement?

    ERIC Educational Resources Information Center

    Greer, Wil

    2013-01-01

    This study identified the variables associated with data-driven instruction (DDI) that are perceived to best predict student achievement. Of the DDI variables discussed in the literature, 51 of them had a sufficient enough research base to warrant statistical analysis. Of them, 26 were statistically significant. Multiple regression and an…

  18. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis

    PubMed Central

    Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876

  19. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis.

    PubMed

    Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.

  20. Individual risk factors for deep infection and compromised fracture healing after intramedullary nailing of tibial shaft fractures: a single centre experience of 480 patients.

    PubMed

    Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S

    2015-04-01

    Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    PubMed

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  2. Accounting for Multiple Births in Neonatal and Perinatal Trials: Systematic Review and Case Study

    PubMed Central

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-01-01

    Objectives To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births. To explore the sensitivity of an actual trial to several analytic approaches to multiples. Methods A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The NO CLD trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using non-clustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. Results In the systematic review, most studies did not describe the randomization of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (p<0.01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. Conclusions The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. PMID:19969305

  3. Accounting for multiple births in neonatal and perinatal trials: systematic review and case study.

    PubMed

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-02-01

    To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births and to explore the sensitivity of an actual trial to several analytic approaches to multiples. A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The Nitric Oxide to Prevent Chronic Lung Disease (NO CLD) trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using nonclustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. In the systematic review, most studies did not describe the random assignment of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (P < .01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. Copyright 2010 Mosby, Inc. All rights reserved.

  4. Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    NASA Technical Reports Server (NTRS)

    Stolzer, Alan J.; Halford, Carl

    2007-01-01

    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.

  5. Simultaneous multiple non-crossing quantile regression estimation using kernel constraints

    PubMed Central

    Liu, Yufeng; Wu, Yichao

    2011-01-01

    Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842

  6. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

    PubMed Central

    Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles

    2012-01-01

    Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072

  7. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

    NASA Astrophysics Data System (ADS)

    Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

    2017-05-01

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.

  8. Regression: The Apple Does Not Fall Far From the Tree.

    PubMed

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  9. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  10. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  11. Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

    PubMed

    McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

    2017-02-01

    Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.

  12. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    PubMed

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  14. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  15. Statistical Evaluation of Time Series Analysis Techniques

    NASA Technical Reports Server (NTRS)

    Benignus, V. A.

    1973-01-01

    The performance of a modified version of NASA's multivariate spectrum analysis program is discussed. A multiple regression model was used to make the revisions. Performance improvements were documented and compared to the standard fast Fourier transform by Monte Carlo techniques.

  16. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    USGS Publications Warehouse

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.

  17. Criteria for the use of regression analysis for remote sensing of sediment and pollutants

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H.; Kuo, C. Y.; Lecroy, S. R.

    1982-01-01

    An examination of limitations, requirements, and precision of the linear multiple-regression technique for quantification of marine environmental parameters is conducted. Both environmental and optical physics conditions have been defined for which an exact solution to the signal response equations is of the same form as the multiple regression equation. Various statistical parameters are examined to define a criteria for selection of an unbiased fit when upwelled radiance values contain error and are correlated with each other. Field experimental data are examined to define data smoothing requirements in order to satisfy the criteria of Daniel and Wood (1971). Recommendations are made concerning improved selection of ground-truth locations to maximize variance and to minimize physical errors associated with the remote sensing experiment.

  18. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  19. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  20. Forecasting defoliation by the gypsy moth in oak stands

    Treesearch

    Robert W. Campbell; Joseph P. Standaert

    1974-01-01

    A multiple-regression model is presented that reflects statistically significant correlations between defoliation by the gypsy moth, the dependent variable, and a series of biotic and physical independent variables. Both possible uses and shortcomings of this model are discussed.

  1. Black Male Labor Force Participation.

    ERIC Educational Resources Information Center

    Baer, Roger K.

    This study attempts to test (via multiple regression analysis) hypothesized relationships between designated independent variables and age specific incidences of labor force participation for black male subpopulations in 54 Standard Metropolitan Statistical Areas. Leading independent variables tested include net migration, earnings, unemployment,…

  2. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  3. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  4. Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

    PubMed

    Lin, Feng-Chang; Zhu, Jun

    2012-01-01

    We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.

  5. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES1

    PubMed Central

    Zhu, Xiang; Stephens, Matthew

    2017-01-01

    Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241

  6. Efficacy of Social Media Adoption on Client Growth for Independent Management Consultants

    DTIC Science & Technology

    2017-02-01

    design , a linear multiple regression with three predictor variables and one dependent variable per testing were used. Under those circumstances...regression test was used to compare the social media adoption of two groups on a single measure to determine if there was a statistical difference...number and types of social media platforms used and their influence on client growth was examined in this research design that used a descriptive

  7. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061

  8. DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL

    EPA Science Inventory

    With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...

  9. Mathematics Readiness of First-Year University Students

    ERIC Educational Resources Information Center

    Atuahene, Francis; Russell, Tammy A.

    2016-01-01

    The majority of high school students, particularly underrepresented minorities (URMs) from low socioeconomic backgrounds are graduating from high school less prepared academically for advanced-level college mathematics. Using 2009 and 2010 course enrollment data, several statistical analyses (multiple linear regression, Cochran Mantel Haenszel…

  10. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  11. Test anxiety and academic performance in chiropractic students.

    PubMed

    Zhang, Niu; Henderson, Charles N R

    2014-01-01

    Objective : We assessed the level of students' test anxiety, and the relationship between test anxiety and academic performance. Methods : We recruited 166 third-quarter students. The Test Anxiety Inventory (TAI) was administered to all participants. Total scores from written examinations and objective structured clinical examinations (OSCEs) were used as response variables. Results : Multiple regression analysis shows that there was a modest, but statistically significant negative correlation between TAI scores and written exam scores, but not OSCE scores. Worry and emotionality were the best predictive models for written exam scores. Mean total anxiety and emotionality scores for females were significantly higher than those for males, but not worry scores. Conclusion : Moderate-to-high test anxiety was observed in 85% of the chiropractic students examined. However, total test anxiety, as measured by the TAI score, was a very weak predictive model for written exam performance. Multiple regression analysis demonstrated that replacing total anxiety (TAI) with worry and emotionality (TAI subscales) produces a much more effective predictive model of written exam performance. Sex, age, highest current academic degree, and ethnicity contributed little additional predictive power in either regression model. Moreover, TAI scores were not found to be statistically significant predictors of physical exam skill performance, as measured by OSCEs.

  12. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571

  13. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

  14. Quantitative assessment of cervical vertebral maturation using cone beam computed tomography in Korean girls.

    PubMed

    Byun, Bo-Ram; Kim, Yong-Il; Yamaguchi, Tetsutaro; Maki, Koutaro; Son, Woo-Sung

    2015-01-01

    This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6-18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R (2) had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status.

  15. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  16. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  17. Feminist identity as a predictor of eating disorder diagnostic status.

    PubMed

    Green, Melinda A; Scott, Norman A; Riopel, Cori M; Skaggs, Anna K

    2008-06-01

    Passive Acceptance (PA) and Active Commitment (AC) subscales of the Feminist Identity Development Scale (FIDS) were examined as predictors of eating disorder diagnostic status as assessed by the Questionnaire for Eating Disorder Diagnoses (Q-EDD). Results of a hierarchical regression analysis revealed PA and AC scores were not statistically significant predictors of ED diagnostic status after controlling for diagnostic subtype. Results of a multiple regression analysis revealed FIDS as a statistically significant predictor of ED diagnostic status when failing to control for ED diagnostic subtype. Discrepancies suggest ED diagnostic subtype may serve as a moderator variable in the relationship between ED diagnostic status and FIDS. (c) 2008 Wiley Periodicals, Inc.

  18. BrightStat.com: free statistics online.

    PubMed

    Stricker, Daniel

    2008-10-01

    Powerful software for statistical analysis is expensive. Here I present BrightStat, a statistical software running on the Internet which is free of charge. BrightStat's goals, its main capabilities and functionalities are outlined. Three different sample runs, a Friedman test, a chi-square test, and a step-wise multiple regression are presented. The results obtained by BrightStat are compared with results computed by SPSS, one of the global leader in providing statistical software, and VassarStats, a collection of scripts for data analysis running on the Internet. Elementary statistics is an inherent part of academic education and BrightStat is an alternative to commercial products.

  19. Trend Analysis Using Microcomputers.

    ERIC Educational Resources Information Center

    Berger, Carl F.

    A trend analysis statistical package and additional programs for the Apple microcomputer are presented. They illustrate strategies of data analysis suitable to the graphics and processing capabilities of the microcomputer. The programs analyze data sets using examples of: (1) analysis of variance with multiple linear regression; (2) exponential…

  20. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  1. Assessing risk factors for periodontitis using regression

    NASA Astrophysics Data System (ADS)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  2. Most Likely to Succeed: Exploring Predictor Variables for the Counselor Preparation Comprehensive Examination

    ERIC Educational Resources Information Center

    Hartwig, Elizabeth Kjellstrand; Van Overschelde, James P.

    2016-01-01

    The authors investigated predictor variables for the Counselor Preparation Comprehensive Examination (CPCE) to examine whether academic variables, demographic variables, and test version were associated with graduate counseling students' CPCE scores. Multiple regression analyses revealed all 3 variables were statistically significant predictors of…

  3. Influence of Family Structure on Health among Youths with Diabetes.

    ERIC Educational Resources Information Center

    Thompson, Sanna J.; Auslander, Wendy F.; White, Neil H.

    2001-01-01

    Discusses the extent to which family structure is significantly associated with health in youth with Type 1 diabetes. Multiple regression analyses demonstrated that family structure remains a significant predictor of youth's health when statistically controlling for race, child's age, family socioeconomic status, and adherence. (BF)

  4. Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng

    2011-11-01

    SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.

  5. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  6. Two SPSS programs for interpreting multiple regression results.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  7. 78 FR 13508 - Interpreting Nondiscrimination Requirements of Executive Order 11246 With Respect to Systemic...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-28

    ...-2.17(b)-(d). Nevertheless, Bureau of Labor Statistics data and numerous research studies indicate... Affairs 193 (2011). Ultimately, the research literature still finds an unexplained gap exists even after... multiple regression as potential evidence of discrimination.\\22\\ Similarly, published research on...

  8. Combining data visualization and statistical approaches for interpreting measurements and meta-data: Integrating heatmaps, variable clustering, and mixed regression models

    EPA Science Inventory

    The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...

  9. Statistical considerations in the analysis of data from replicated bioassays

    USDA-ARS?s Scientific Manuscript database

    Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...

  10. Artificial Neural Networks in Policy Research: A Current Assessment.

    ERIC Educational Resources Information Center

    Woelfel, Joseph

    1993-01-01

    Suggests that artificial neural networks (ANNs) exhibit properties that promise usefulness for policy researchers. Notes that ANNs have found extensive use in areas once reserved for multivariate statistical programs such as regression and multiple classification analysis and are developing an extensive community of advocates for processing text…

  11. Child Mortality in a Developing Country: A Statistical Analysis

    ERIC Educational Resources Information Center

    Uddin, Md. Jamal; Hossain, Md. Zakir; Ullah, Mohammad Ohid

    2009-01-01

    This study uses data from the "Bangladesh Demographic and Health Survey (BDHS] 1999-2000" to investigate the predictors of child (age 1-4 years) mortality in a developing country like Bangladesh. The cross-tabulation and multiple logistic regression techniques have been used to estimate the predictors of child mortality. The…

  12. Quantitatively Assessing Reported Crime versus Enrollment among Selected Higher Education Institutions

    ERIC Educational Resources Information Center

    Doss, Daniel; Lackey, Hilliard; McElreath, David; Gokaraju, Balakrishna; Tesiero, Raymond; Jones, Don; Lusk, Glenna

    2017-01-01

    This study uses multiple regressions to examine campus safety and campus security from the perspective of societal crime that occurs external to an institution of higher education versus institutional enrollment. The findings herein showed one statistically significant outcome involving the crime of aggravated assault. Student affairs and other…

  13. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  14. Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

    NASA Astrophysics Data System (ADS)

    Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

    2018-01-01

    Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.

  15. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for maximal response. For the calculation of the regression coefficients, dispersion and correlation coefficients, the software Matlab was used.

  16. Regional Regression Equations to Estimate Flow-Duration Statistics at Ungaged Stream Sites in Connecticut

    USGS Publications Warehouse

    Ahearn, Elizabeth A.

    2010-01-01

    Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In contrast, the Rearing and Growth (July-October) bioperiod had the largest standard errors, ranging from 30.9 to 156 percent. The adjusted coefficient of determination of the equations ranged from 77.5 to 99.4 percent with medians of 98.5 and 90.6 percent to predict the 25- and 99-percent exceedances, respectively. Descriptive information on the streamgages used in the regression, measured basin and climatic characteristics, and estimated flow-duration statistics are provided in this report. Flow-duration statistics and the 32 regression equations for estimating flow-duration statistics in Connecticut are stored on the U.S. Geological Survey World Wide Web application ?StreamStats? (http://water.usgs.gov/osw/streamstats/index.html). The regression equations developed in this report can be used to produce unbiased estimates of select flow exceedances statewide.

  17. Quantitative Assessment of Cervical Vertebral Maturation Using Cone Beam Computed Tomography in Korean Girls

    PubMed Central

    Byun, Bo-Ram; Kim, Yong-Il; Maki, Koutaro; Son, Woo-Sung

    2015-01-01

    This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6–18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R 2 had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status. PMID:25878721

  18. Application of Multiregressive Linear Models, Dynamic Kriging Models and Neural Network Models to Predictive Maintenance of Hydroelectric Power Systems

    NASA Astrophysics Data System (ADS)

    Lucifredi, A.; Mazzieri, C.; Rossi, M.

    2000-05-01

    Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.

  19. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    NASA Technical Reports Server (NTRS)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  20. Methods for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma

    USGS Publications Warehouse

    Esralew, Rachel A.; Smith, S. Jerrod

    2010-01-01

    Flow statistics can be used to provide decision makers with surface-water information needed for activities such as water-supply permitting, flow regulation, and other water rights issues. Flow statistics could be needed at any location along a stream. Most often, streamflow statistics are needed at ungaged sites, where no flow data are available to compute the statistics. Methods are presented in this report for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma. Flow statistics included the (1) annual (period of record), (2) seasonal (summer-autumn and winter-spring), and (3) 12 monthly duration statistics, including the 20th, 50th, 80th, 90th, and 95th percentile flow exceedances, and the annual mean-flow (mean of daily flows for the period of record). Flow statistics were calculated from daily streamflow information collected from 235 streamflow-gaging stations throughout Oklahoma and areas in adjacent states. A drainage-area ratio method is the preferred method for estimating flow statistics at an ungaged location that is on a stream near a gage. The method generally is reliable only if the drainage-area ratio of the two sites is between 0.5 and 1.5. Regression equations that relate flow statistics to drainage-basin characteristics were developed for the purpose of estimating selected flow-duration and annual mean-flow statistics for ungaged streams that are not near gaging stations on the same stream. Regression equations were developed from flow statistics and drainage-basin characteristics for 113 unregulated gaging stations. Separate regression equations were developed by using U.S. Geological Survey streamflow-gaging stations in regions with similar drainage-basin characteristics. These equations can increase the accuracy of regression equations used for estimating flow-duration and annual mean-flow statistics at ungaged stream locations in Oklahoma. Streamflow-gaging stations were grouped by selected drainage-basin characteristics by using a k-means cluster analysis. Three regions were identified for Oklahoma on the basis of the clustering of gaging stations and a manual delineation of distinguishable hydrologic and geologic boundaries: Region 1 (western Oklahoma excluding the Oklahoma and Texas Panhandles), Region 2 (north- and south-central Oklahoma), and Region 3 (eastern and central Oklahoma). A total of 228 regression equations (225 flow-duration regressions and three annual mean-flow regressions) were developed using ordinary least-squares and left-censored (Tobit) multiple-regression techniques. These equations can be used to estimate 75 flow-duration statistics and annual mean-flow for ungaged streams in the three regions. Drainage-basin characteristics that were statistically significant independent variables in the regression analyses were (1) contributing drainage area; (2) station elevation; (3) mean drainage-basin elevation; (4) channel slope; (5) percentage of forested canopy; (6) mean drainage-basin hillslope; (7) soil permeability; and (8) mean annual, seasonal, and monthly precipitation. The accuracy of flow-duration regression equations generally decreased from high-flow exceedance (low-exceedance probability) to low-flow exceedance (high-exceedance probability) . This decrease may have happened because a greater uncertainty exists for low-flow estimates and low-flow is largely affected by localized geology that was not quantified by the drainage-basin characteristics selected. The standard errors of estimate of regression equations for Region 1 (western Oklahoma) were substantially larger than those standard errors for other regions, especially for low-flow exceedances. These errors may be a result of greater variability in low flow because of increased irrigation activities in this region. Regression equations may not be reliable for sites where the drainage-basin characteristics are outside the range of values of independent vari

  1. On the Stationarity of Multiple Autoregressive Approximants: Theory and Algorithms

    DTIC Science & Technology

    1976-08-01

    a I (3.4) Hannan and Terrell (1972) consider problems of a similar nature. Efficient estimates A(1),... , A(p) , and i of A(1)... ,A(p) and...34Autoregressive model fitting for control, Ann . Inst. Statist. Math., 23, 163-180. Hannan, E. J. (1970), Multiple Time Series, New York, John Wiley...Hannan, E. J. and Terrell , R. D. (1972), "Time series regression with linear constraints, " International Economic Review, 13, 189-200. Masani, P

  2. Estimation of stature from the foot and its segments in a sub-adult female population of North India

    PubMed Central

    2011-01-01

    Background Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. Methods The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. Results The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. Conclusions The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults. PMID:22104433

  3. Estimation of stature from the foot and its segments in a sub-adult female population of North India.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam

    2011-11-21

    Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults.

  4. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.

    PubMed

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2015-11-18

    Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.

  5. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method

    PubMed Central

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2016-01-01

    Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889

  6. Modeling Longitudinal Data Containing Non-Normal Within Subject Errors

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan; Glenn, Nancy L.

    2013-01-01

    The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.

  7. Using Algal Metrics and Biomass to Evaluate Multiple Ways of Defining Concentration-Based Nutrient Criteria in Streams and their Ecological Relevance

    EPA Science Inventory

    We examined the utility of nutrient criteria derived solely from total phosphorus (TP) concentrations in streams (regression models and percentile distributions) and evaluated their ecological relevance to diatom and algal biomass responses. We used a variety of statistics to cha...

  8. Modeling Success: Using Preenrollment Data to Identify Academically At-Risk Students

    ERIC Educational Resources Information Center

    Gansemer-Topf, Ann M.; Compton, Jonathan; Wohlgemuth, Darin; Forbes, Greg; Ralston, Ekaterina

    2015-01-01

    Improving student success and degree completion is one of the core principles of strategic enrollment management. To address this principle, institutional data were used to develop a statistical model to identify academically at-risk students. The model employs multiple linear regression techniques to predict students at risk of earning below a…

  9. Adaptive variation in Pinus ponderosa from Intermountain regions. II. Middle Columbia River system

    Treesearch

    Gerald Rehfeldt

    1986-01-01

    Seedling populations were grown and compared in common environments. Statistical analyses detected genetic differences between populations for numerous traits reflecting growth potential and periodicity of shoot elongation. Multiple regression models described an adaptive landscape in which populations from low elevations have a high growth potential while those from...

  10. Electronic Resource Expenditure and the Decline in Reference Transaction Statistics in Academic Libraries

    ERIC Educational Resources Information Center

    Dubnjakovic, Ana

    2012-01-01

    The current study investigates factors influencing increase in reference transactions in a typical week in academic libraries across the United States of America. Employing multiple regression analysis and general linear modeling, variables of interest from the "Academic Library Survey (ALS) 2006" survey (sample size 3960 academic libraries) were…

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Kandler; Shi, Ying; Santhanagopalan, Shriram

    Predictive models of Li-ion battery lifetime must consider a multiplicity of electrochemical, thermal, and mechanical degradation modes experienced by batteries in application environments. To complicate matters, Li-ion batteries can experience different degradation trajectories that depend on storage and cycling history of the application environment. Rates of degradation are controlled by factors such as temperature history, electrochemical operating window, and charge/discharge rate. We present a generalized battery life prognostic model framework for battery systems design and control. The model framework consists of trial functions that are statistically regressed to Li-ion cell life datasets wherein the cells have been aged under differentmore » levels of stress. Degradation mechanisms and rate laws dependent on temperature, storage, and cycling condition are regressed to the data, with multiple model hypotheses evaluated and the best model down-selected based on statistics. The resulting life prognostic model, implemented in state variable form, is extensible to arbitrary real-world scenarios. The model is applicable in real-time control algorithms to maximize battery life and performance. We discuss efforts to reduce lifetime prediction error and accommodate its inevitable impact in controller design.« less

  12. Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration.

    PubMed

    Bartlett, Jonathan W; Keogh, Ruth H

    2018-06-01

    Bayesian approaches for handling covariate measurement error are well established and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper, we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration, arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next, we describe the closely related maximum likelihood and multiple imputation approaches and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of regression calibration and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey.

  13. Combined statistical analyses for long-term stability data with multiple storage conditions: a simulation study.

    PubMed

    Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R

    2014-01-01

    Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.

  14. Guidelines and Procedures for Computing Time-Series Suspended-Sediment Concentrations and Loads from In-Stream Turbidity-Sensor and Streamflow Data

    USGS Publications Warehouse

    Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.

    2009-01-01

    In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.

  15. A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data.

    PubMed

    Spelman, Tim; Gray, Orla; Lucas, Robyn; Butzkueven, Helmut

    2015-12-09

    This report describes a novel Stata-based application of trigonometric regression modelling to 55 years of multiple sclerosis relapse data from 46 clinical centers across 20 countries located in both hemispheres. Central to the success of this method was the strategic use of plot analysis to guide and corroborate the statistical regression modelling. Initial plot analysis was necessary for establishing realistic hypotheses regarding the presence and structural form of seasonal and latitudinal influences on relapse probability and then testing the performance of the resultant models. Trigonometric regression was then necessary to quantify these relationships, adjust for important confounders and provide a measure of certainty as to how plausible these associations were. Synchronization of graphing techniques with regression modelling permitted a systematic refinement of models until best-fit convergence was achieved, enabling novel inferences to be made regarding the independent influence of both season and latitude in predicting relapse onset timing in MS. These methods have the potential for application across other complex disease and epidemiological phenomena suspected or known to vary systematically with season and/or geographic location.

  16. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  17. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  18. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  19. [Factors of psychiatric treatment satisfaction in inpatients with neurotic and depressive disorders].

    PubMed

    Tsygankov, B D; Malygin, Ya V; Gatin, F F

    2015-01-01

    Factors of patients' satisfaction with medical care vary depending on the level of care and medical specialty. Patient's satisfaction with psychiatric care is understudied. An aim of the present study is to find out the factors of satisfaction with psychiatric care in inpatients with neurotic and depressive disorders. The sample included 356 inpatients suffering from neurotic or depressive disorders. The patients were questioned using PAPI questionnaire designed for this study. Statistical analysis was performed using multiple regression. Key factors of satisfaction with medical care included quality of work of nurses and psychiatrists, hospital ward comfort, the number and quality of psychotherapeutic sessions, psychiatrists' empathy and aptitude to provide the patient with information about the disease and treatment. Multiple regression equation explained 81% of the variance of patients' satisfaction.

  20. Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

    PubMed

    Genser, Bernd; Fischer, Joachim E; Figueiredo, Camila A; Alcântara-Neves, Neuza; Barreto, Mauricio L; Cooper, Philip J; Amorim, Leila D; Saemann, Marcus D; Weichhart, Thomas; Rodrigues, Laura C

    2016-05-20

    Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes.

  1. Experimental Investigations of Non-Stationary Properties In Radiometer Receivers Using Measurements of Multiple Calibration References

    NASA Technical Reports Server (NTRS)

    Racette, Paul; Lang, Roger; Zhang, Zhao-Nan; Zacharias, David; Krebs, Carolyn A. (Technical Monitor)

    2002-01-01

    Radiometers must be periodically calibrated because the receiver response fluctuates. Many techniques exist to correct for the time varying response of a radiometer receiver. An analytical technique has been developed that uses generalized least squares regression (LSR) to predict the performance of a wide variety of calibration algorithms. The total measurement uncertainty including the uncertainty of the calibration can be computed using LSR. The uncertainties of the calibration samples used in the regression are based upon treating the receiver fluctuations as non-stationary processes. Signals originating from the different sources of emission are treated as simultaneously existing random processes. Thus, the radiometer output is a series of samples obtained from these random processes. The samples are treated as random variables but because the underlying processes are non-stationary the statistics of the samples are treated as non-stationary. The statistics of the calibration samples depend upon the time for which the samples are to be applied. The statistics of the random variables are equated to the mean statistics of the non-stationary processes over the interval defined by the time of calibration sample and when it is applied. This analysis opens the opportunity for experimental investigation into the underlying properties of receiver non stationarity through the use of multiple calibration references. In this presentation we will discuss the application of LSR to the analysis of various calibration algorithms, requirements for experimental verification of the theory, and preliminary results from analyzing experiment measurements.

  2. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    NASA Astrophysics Data System (ADS)

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui

    2016-03-01

    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  3. Antecedents of students' achievement in statistics

    NASA Astrophysics Data System (ADS)

    Awaludin, Izyan Syazana; Razak, Ruzanna Ab; Harris, Hezlin; Selamat, Zarehan

    2015-02-01

    The applications of statistics in most fields have been vast. Many degree programmes at local universities require students to enroll in at least one statistics course. The standard of these courses varies across different degree programmes. This is because of students' diverse academic backgrounds in which some comes far from the field of statistics. The high failure rate in statistics courses for non-science stream students had been concerning every year. The purpose of this research is to investigate the antecedents of students' achievement in statistics. A total of 272 students participated in the survey. Multiple linear regression was applied to examine the relationship between the factors and achievement. We found that statistics anxiety was a significant predictor of students' achievement. We also found that students' age has significant effect to achievement. Older students are more likely to achieve lowers scores in statistics. Student's level of study also has a significant impact on their achievement in statistics.

  4. SU-F-R-20: Image Texture Features Correlate with Time to Local Failure in Lung SBRT Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrews, M; Abazeed, M; Woody, N

    Purpose: To explore possible correlation between CT image-based texture and histogram features and time-to-local-failure in early stage non-small cell lung cancer (NSCLC) patients treated with stereotactic body radiotherapy (SBRT).Methods and Materials: From an IRB-approved lung SBRT registry for patients treated between 2009–2013 we selected 48 (20 male, 28 female) patients with local failure. Median patient age was 72.3±10.3 years. Mean time to local failure was 15 ± 7.1 months. Physician-contoured gross tumor volumes (GTV) on the planning CT images were processed and 3D gray-level co-occurrence matrix (GLCM) based texture and histogram features were calculated in Matlab. Data were exported tomore » R and a multiple linear regression model was used to examine the relationship between texture features and time-to-local-failure. Results: Multiple linear regression revealed that entropy (p=0.0233, multiple R2=0.60) from GLCM-based texture analysis and the standard deviation (p=0.0194, multiple R2=0.60) from the histogram-based features were statistically significantly correlated with the time-to-local-failure. Conclusion: Image-based texture analysis can be used to predict certain aspects of treatment outcomes of NSCLC patients treated with SBRT. We found entropy and standard deviation calculated for the GTV on the CT images displayed a statistically significant correlation with and time-to-local-failure in lung SBRT patients.« less

  5. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

    PubMed

    Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

    2017-01-01

    If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

  6. Models for predicting the mass of lime fruits by some engineering properties.

    PubMed

    Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein

    2014-11-01

    Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.

  7. A new multiple regression model to identify multi-family houses with a high prevalence of sick building symptoms "SBS", within the healthy sustainable house study in Stockholm (3H).

    PubMed

    Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G

    2010-01-01

    The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.

  8. Relationship between a Belief in a Just World and Social Justice Advocacy Attitudes of School Counselors

    ERIC Educational Resources Information Center

    Parikh, Sejal B.; Post, Phyllis; Flowers, Claudia

    2011-01-01

    The purpose of this study was to examine how belief in a just world (BJW), political ideology, religious ideology, socioeconomic status of origin, and race relate to social justice advocacy attitudes among school counseling professionals. A sequential multiple regression indicated that political ideology and BJW were statistically significant…

  9. Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

    ERIC Educational Resources Information Center

    Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.

    2012-01-01

    Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…

  10. Using Regression Analysis To Determine If Faculty Salaries Are Overly Compressed. AIR 1997 Annual Forum Paper.

    ERIC Educational Resources Information Center

    Toutkoushian, Robert K.

    This paper proposes a five-step process by which to analyze whether the salary ratio between junior and senior college faculty exhibits salary compression, a term used to describe an unusually small differential between faculty with different levels of experience. The procedure utilizes commonly used statistical techniques (multiple regression…

  11. Geographical variation in the spatial synchrony of a forest-defoliating insect: isolation of environmental and spatial drivers

    Treesearch

    K.yle J. Haynes; Ottar N. Bjornstad; Andrew J. Allstadt; Andrew M. Liebhold

    2012-01-01

    Despite the pervasiveness of spatial synchrony of population fluctuations in virtually every taxon, it remains difficult to disentangle its underlying mechanisms, such as environmental perturbations and dispersal. We used multiple regression of distance matrices (MRMs) to statistically partition the importance of several factors potentially synchronizing the dynamics...

  12. A Mixed-Methods Study Investigating the Relationship between Media Multitasking Orientation and Grade Point Average

    ERIC Educational Resources Information Center

    Lee, Jennifer

    2012-01-01

    The intent of this study was to examine the relationship between media multitasking orientation and grade point average. The study utilized a mixed-methods approach to investigate the research questions. In the quantitative section of the study, the primary method of statistical analyses was multiple regression. The independent variables for the…

  13. The Effect of Attending Tutoring on Course Grades in Calculus I

    ERIC Educational Resources Information Center

    Rickard, Brian; Mills, Melissa

    2018-01-01

    Tutoring centres are common in universities in the United States, but there are few published studies that statistically examine the effects of tutoring on student success. This study utilizes multiple regression analysis to model the effect of tutoring attendance on final course grades in Calculus I. Our model predicted that every three visits to…

  14. Helping Students Assess the Relative Importance of Different Intermolecular Interactions

    ERIC Educational Resources Information Center

    Jasien, Paul G.

    2008-01-01

    A semi-quantitative model has been developed to estimate the relative effects of dispersion, dipole-dipole interactions, and H-bonding on the normal boiling points ("T[subscript b]") for a subset of simple organic systems. The model is based upon a statistical analysis using multiple linear regression on a series of straight-chain organic…

  15. Prenatal Exposure to Alcohol, Caffeine, Tobacco, and Aspirin: Effects on Fine and Gross Motor Preformance in 4-Year-Old Children.

    ERIC Educational Resources Information Center

    Barr, Helen M.; And Others

    1990-01-01

    Multiple regression analyses of data from 449 children indicated statistically significant relationships between moderate levels of prenatal alcohol exposure and increased errors, increased latency, and increased total time on the Wisconsin Fine Motor Steadiness Battery and poorer balance on the Gross Motor Scale. (RH)

  16. Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

    NASA Astrophysics Data System (ADS)

    Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

    2014-09-01

    Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.

  17. Periodicity analysis of tourist arrivals to Banda Aceh using smoothing SARIMA approach

    NASA Astrophysics Data System (ADS)

    Miftahuddin, Helida, Desri; Sofyan, Hizir

    2017-11-01

    Forecasting the number of tourist arrivals who enters a region is needed for tourism businesses, economic and industrial policies, so that the statistical modeling needs to be conducted. Banda Aceh is the capital of Aceh province more economic activity is driven by the services sector, one of which is the tourism sector. Therefore, the prediction of the number of tourist arrivals is needed to develop further policies. The identification results indicate that the data arrival of foreign tourists to Banda Aceh to contain the trend and seasonal nature. Allegedly, the number of arrivals is influenced by external factors, such as economics, politics, and the holiday season caused the structural break in the data. Trend patterns are detected by using polynomial regression with quadratic and cubic approaches, while seasonal is detected by a periodic regression polynomial with quadratic and cubic approach. To model the data that has seasonal effects, one of the statistical methods that can be used is SARIMA (Seasonal Autoregressive Integrated Moving Average). The results showed that the smoothing, a method to detect the trend pattern is cubic polynomial regression approach, with the modified model and the multiplicative periodicity of 12 months. The AIC value obtained was 70.52. While the method for detecting the seasonal pattern is a periodic regression polynomial cubic approach, with the modified model and the multiplicative periodicity of 12 months. The AIC value obtained was 73.37. Furthermore, the best model to predict the number of foreign tourist arrivals to Banda Aceh in 2017 to 2018 is SARIMA (0,1,1)(1,1,0) with MAPE is 26%.

  18. Modification of the USLE K factor for soil erodibility assessment on calcareous soils in Iran

    NASA Astrophysics Data System (ADS)

    Ostovari, Yaser; Ghorbani-Dashtaki, Shoja; Bahrami, Hossein-Ali; Naderi, Mehdi; Dematte, Jose Alexandre M.; Kerry, Ruth

    2016-11-01

    The measurement of soil erodibility (K) in the field is tedious, time-consuming and expensive; therefore, its prediction through pedotransfer functions (PTFs) could be far less costly and time-consuming. The aim of this study was to develop new PTFs to estimate the K factor using multiple linear regression, Mamdani fuzzy inference systems, and artificial neural networks. For this purpose, K was measured in 40 erosion plots with natural rainfall. Various soil properties including the soil particle size distribution, calcium carbonate equivalent, organic matter, permeability, and wet-aggregate stability were measured. The results showed that the mean measured K was 0.014 t h MJ- 1 mm- 1 and 2.08 times less than the estimated mean K (0.030 t h MJ- 1 mm- 1) using the USLE model. Permeability, wet-aggregate stability, very fine sand, and calcium carbonate were selected as independent variables by forward stepwise regression in order to assess the ability of multiple linear regression, Mamdani fuzzy inference systems and artificial neural networks to predict K. The calcium carbonate equivalent, which is not accounted for in the USLE model, had a significant impact on K in multiple linear regression due to its strong influence on the stability of aggregates and soil permeability. Statistical indices in validation and calibration datasets determined that the artificial neural networks method with the highest R2, lowest RMSE, and lowest ME was the best model for estimating the K factor. A strong correlation (R2 = 0.81, n = 40, p < 0.05) between the estimated K from multiple linear regression and measured K indicates that the use of calcium carbonate equivalent as a predictor variable gives a better estimation of K in areas with calcareous soils.

  19. A Critical Examination of Figure of Merit (FOM). Assessing the Goodness-of-Fit in Gamma/X-ray Peak Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Croft, S.; Favalli, Andrea; Weaver, Brian Phillip

    2015-10-06

    In this paper we develop and investigate several criteria for assessing how well a proposed spectral form fits observed spectra. We consider the classical improved figure of merit (FOM) along with several modifications, as well as criteria motivated by Poisson regression from the statistical literature. We also develop a new FOM that is based on the statistical idea of the bootstrap. A spectral simulator has been developed to assess the performance of these different criteria under multiple data configurations.

  20. Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08

    PubMed Central

    Casey, Martin F.; Neidell, Matthew

    2013-01-01

    Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205

  1. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  2. The role of enamel thickness and refractive index on human tooth colour.

    PubMed

    Oguro, Rena; Nakajima, Masatoshi; Seki, Naoko; Sadr, Alireza; Tagami, Junji; Sumi, Yasunori

    2016-08-01

    To investigate the role of enamel thickness and refractive index (n) on tooth colour. The colour and enamel thickness of fifteen extracted human central incisors were determined according to CIELab colour scale using spectrophotometer (Crystaleye) and swept-source optical coherence tomography (SS-OCT), respectively. Subsequently, labial enamel was trimmed by approximately 100μm, and the colour and remaining enamel thickness were investigated again. This cycle was repeated until dentin appeared. Enamel blocks were prepared from the same teeth and their n were obtained using SS-OCT. Multiple regression analysis was performed to reveal any effects of enamel thickness and n on colour difference (ΔE00) and differences in colour parameters with CIELCh and CIELab colour scales. Multiple regression analysis revealed that enamel thickness (p=0.02) and n of enamel (p<0.001) were statistically significant predictors of ΔE00 after complete enamel trimming. The n was also a significant predictor of ΔH' (p=0.01). Enamel thickness and n were not statistically significant predictors of ΔL', ΔC', Δa* and Δb*. Enamel affected tooth colour, in which n was a statistically significant predictor for tooth colour change. Understanding the role of enamel in tooth colour could contribute to development of aesthetic restorative materials that mimic the colour of natural tooth with minimal reduction of the existing enamel. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  4. Prevalence of consistent condom use with various types of sex partners and associated factors among money boys in Changsha, China.

    PubMed

    Wang, Lian-Hong; Yan, Jin; Yang, Guo-Li; Long, Shuo; Yu, Yong; Wu, Xi-Lin

    2015-04-01

    Money boys with inconsistent condom use (less than 100% of the time) are at high risk of infection by human immunodeficiency virus (HIV) or sexually transmitted infection (STI), but relatively little research has examined their risk behaviors. We investigated the prevalence of consistent condom use (100% of the time) and associated factors among money boys. A cross-sectional study using a structured questionnaire was conducted among money boys in Changsha, China, between July 2012 and January 2013. Independent variables included socio-demographic data, substance abuse history, work characteristics, and self-reported HIV and STI history. Dependent variables included the consistent condom use with different types of sex partners. Among the participants, 82.4% used condoms consistently with male clients, 80.2% with male sex partners, and 77.1% with female sex partners in the past 3 months. A multiple stepwise logistic regression model identified four statistically significant factors associated with lower likelihoods of consistent condom use with male clients: age group, substance abuse, lack of an "employment" arrangement, and having no HIV test within the prior 6 months. In a similar model, only one factor associated significantly with lower likelihoods of consistent condom use with male sex partners was identified in multiple stepwise logistic regression analyses: having no HIV test within the prior six months. As for female sex partners, two significant variables were statistically significant in the multiple stepwise logistic regression analysis: having no HIV test within the prior 6 months and having STI history. Interventions which are linked with more realistic and acceptable HIV prevention methods are greatly warranted and should increase risk awareness and the behavior of consistent condom use in both commercial and personal relationship. © 2015 International Society for Sexual Medicine.

  5. Malignant testicular tumour incidence and mortality trends

    PubMed Central

    Wojtyła-Buciora, Paulina; Więckowska, Barbara; Krzywinska-Wiewiorowska, Małgorzata; Gromadecka-Sutkiewicz, Małgorzata

    2016-01-01

    Aim of the study In Poland testicular tumours are the most frequent cancer among men aged 20–44 years. Testicular tumour incidence since the 1980s and 1990s has been diversified geographically, with an increased risk of mortality in Wielkopolska Province, which was highlighted at the turn of the 1980s and 1990s. The aim of the study was the comparative analysis of the tendencies in incidence and death rates due to malignant testicular tumours observed among men in Poland and in Wielkopolska Province. Material and methods Data from the National Cancer Registry were used for calculations. The incidence/mortality rates among men due to malignant testicular cancer as well as the tendencies in incidence/death ratio observed in Poland and Wielkopolska were established based on regression equation. The analysis was deepened by adopting the multiple linear regression model. A p-value < 0.05 was arbitrarily adopted as the criterion of statistical significance, and for multiple comparisons it was modified according to the Bonferroni adjustment to a value of p < 0.0028. Calculations were performed with the use of PQStat v1.4.8 package. Results The incidence of malignant testicular neoplasms observed among men in Poland and in Wielkopolska Province indicated a significant rising tendency. The multiple linear regression model confirmed that the year variable is a strong incidence forecast factor only within the territory of Poland. A corresponding analysis of mortality rates among men in Poland and in Wielkopolska Province did not show any statistically significant correlations. Conclusions Late diagnosis of Polish patients calls for undertaking appropriate educational activities that would facilitate earlier reporting of the patients, thus increasing their chances for recovery. Introducing preventive examinations in the regions of increased risk of testicular tumour may allow earlier diagnosis. PMID:27095941

  6. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  7. Health Service Access across Racial/Ethnic Groups of Children in the Child Welfare System

    ERIC Educational Resources Information Center

    Wells, Rebecca; Hillemeier, Marianne M.; Bai, Yu; Belue, Rhonda

    2009-01-01

    Objective: This study examined health service access among children of different racial/ethnic groups in the child welfare system in an attempt to identify and explain disparities. Methods: Data were from the National Survey of Child and Adolescent Well-Being (NSCAW). N for descriptive statistics = 2,505. N for multiple regression model = 537.…

  8. Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

    Treesearch

    Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

    2005-01-01

    We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...

  9. Response Rate and Teaching Effectiveness in Institutional Student Evaluation of Teaching: A Multiple Linear Regression Study

    ERIC Educational Resources Information Center

    Al-Maamari, Faisal

    2015-01-01

    It is important to consider the question of whether teacher-, course-, and student-related factors affect student ratings of instructors in Student Evaluation of Teaching (SET) in English Language Teaching (ELT). This paper reports on a statistical analysis of SET in two large EFL programmes at a university setting in the Sultanate of Oman. I…

  10. Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment

    NASA Astrophysics Data System (ADS)

    Sahoo, Sasmita; Jha, Madan K.

    2013-12-01

    The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.

  11. [Factors associated with physical activity among Chinese immigrant women].

    PubMed

    Cho, Sung-Hye; Lee, Hyeonkyeong

    2013-12-01

    This study was done to assess the level of physical activity among Chinese immigrant women and to determine the relationships of physical activity with individual characteristics and behavior-specific cognition. A cross-sectional descriptive study was conducted with 161 Chinese immigrant women living in Busan. A health promotion model of physical activity adapted from Pender's Health Promotion Model was used. Self-administered questionnaires were used to collect data during the period from September 25 to November 20, 2012. Using SPSS 18.0 program, descriptive statistics, t-test, analysis of variance, correlation analysis, and multiple regression analysis were done. The average level of physical activity of the Chinese immigrant women was 1,050.06 ± 686.47 MET-min/week and the minimum activity among types of physical activity was most dominant (59.6%). As a result of multiple regression analysis, it was confirmed that self-efficacy and acculturation were statistically significant variables in the model (p<.001), with an explanatory power of 23.7%. The results indicate that the development and application of intervention strategies to increase acculturation and self-efficacy for immigrant women will aid in increasing the physical activity in Chinese immigrant women.

  12. The relationship between quality of work life and turnover intention of primary health care nurses in Saudi Arabia.

    PubMed

    Almalki, Mohammed J; FitzGerald, Gerry; Clark, Michele

    2012-09-12

    Quality of work life (QWL) has been found to influence the commitment of health professionals, including nurses. However, reliable information on QWL and turnover intention of primary health care (PHC) nurses is limited. The aim of this study was to examine the relationship between QWL and turnover intention of PHC nurses in Saudi Arabia. A cross-sectional survey was used in this study. Data were collected using Brooks' survey of Quality of Nursing Work Life, the Anticipated Turnover Scale and demographic data questions. A total of 508 PHC nurses in the Jazan Region, Saudi Arabia, completed the questionnaire (RR = 87%). Descriptive statistics, t-test, ANOVA, General Linear Model (GLM) univariate analysis, standard multiple regression, and hierarchical multiple regression were applied for analysis using SPSS v17 for Windows. Findings suggested that the respondents were dissatisfied with their work life, with almost 40% indicating a turnover intention from their current PHC centres. Turnover intention was significantly related to QWL. Using standard multiple regression, 26% of the variance in turnover intention was explained by QWL, p < 0.001, with R2 = .263. Further analysis using hierarchical multiple regression found that the total variance explained by the model as a whole (demographics and QWL) was 32.1%, p < 0.001. QWL explained an additional 19% of the variance in turnover intention, after controlling for demographic variables. Creating and maintaining a healthy work life for PHC nurses is very important to improve their work satisfaction, reduce turnover, enhance productivity and improve nursing care outcomes.

  13. The relationship between quality of work life and turnover intention of primary health care nurses in Saudi Arabia

    PubMed Central

    2012-01-01

    Background Quality of work life (QWL) has been found to influence the commitment of health professionals, including nurses. However, reliable information on QWL and turnover intention of primary health care (PHC) nurses is limited. The aim of this study was to examine the relationship between QWL and turnover intention of PHC nurses in Saudi Arabia. Methods A cross-sectional survey was used in this study. Data were collected using Brooks’ survey of Quality of Nursing Work Life, the Anticipated Turnover Scale and demographic data questions. A total of 508 PHC nurses in the Jazan Region, Saudi Arabia, completed the questionnaire (RR = 87%). Descriptive statistics, t-test, ANOVA, General Linear Model (GLM) univariate analysis, standard multiple regression, and hierarchical multiple regression were applied for analysis using SPSS v17 for Windows. Results Findings suggested that the respondents were dissatisfied with their work life, with almost 40% indicating a turnover intention from their current PHC centres. Turnover intention was significantly related to QWL. Using standard multiple regression, 26% of the variance in turnover intention was explained by QWL, p < 0.001, with R2 = .263. Further analysis using hierarchical multiple regression found that the total variance explained by the model as a whole (demographics and QWL) was 32.1%, p < 0.001. QWL explained an additional 19% of the variance in turnover intention, after controlling for demographic variables. Conclusions Creating and maintaining a healthy work life for PHC nurses is very important to improve their work satisfaction, reduce turnover, enhance productivity and improve nursing care outcomes. PMID:22970764

  14. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    PubMed

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  15. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    USGS Publications Warehouse

    Ohlmacher, G.C.; Davis, J.C.

    2003-01-01

    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  16. Computing mammographic density from a multiple regression model constructed with image-acquisition parameters from a full-field digital mammographic unit

    PubMed Central

    Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.

    2009-01-01

    Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343

  17. Multiple regression technique for Pth degree polynominals with and without linear cross products

    NASA Technical Reports Server (NTRS)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  18. A Study of the Effect of the Front-End Styling of Sport Utility Vehicles on Pedestrian Head Injuries

    PubMed Central

    Qin, Qin; Chen, Zheng; Bai, Zhonghao; Cao, Libo

    2018-01-01

    Background The number of sport utility vehicles (SUVs) on China market is continuously increasing. It is necessary to investigate the relationships between the front-end styling features of SUVs and head injuries at the styling design stage for improving the pedestrian protection performance and product development efficiency. Methods Styling feature parameters were extracted from the SUV side contour line. And simplified finite element models were established based on the 78 SUV side contour lines. Pedestrian headform impact simulations were performed and validated. The head injury criterion of 15 ms (HIC15) at four wrap-around distances was obtained. A multiple linear regression analysis method was employed to describe the relationships between the styling feature parameters and the HIC15 at each impact point. Results The relationship between the selected styling features and the HIC15 showed reasonable correlations, and the regression models and the selected independent variables showed statistical significance. Conclusions The regression equations obtained by multiple linear regression can be used to assess the performance of SUV styling in protecting pedestrians' heads and provide styling designers with technical guidance regarding their artistic creations.

  19. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    PubMed

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  1. VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

    PubMed

    Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

    2016-01-01

    In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.

  2. HAPRAP: a haplotype-based iterative method for statistical fine mapping using GWAS summary statistics.

    PubMed

    Zheng, Jie; Rodriguez, Santiago; Laurin, Charles; Baird, Denis; Trela-Larsen, Lea; Erzurumluoglu, Mesut A; Zheng, Yi; White, Jon; Giambartolomei, Claudia; Zabaneh, Delilah; Morris, Richard; Kumari, Meena; Casas, Juan P; Hingorani, Aroon D; Evans, David M; Gaunt, Tom R; Day, Ian N M

    2017-01-01

    Fine mapping is a widely used approach for identifying the causal variant(s) at disease-associated loci. Standard methods (e.g. multiple regression) require individual level genotypes. Recent fine mapping methods using summary-level data require the pairwise correlation coefficients ([Formula: see text]) of the variants. However, haplotypes rather than pairwise [Formula: see text], are the true biological representation of linkage disequilibrium (LD) among multiple loci. In this article, we present an empirical iterative method, HAPlotype Regional Association analysis Program (HAPRAP), that enables fine mapping using summary statistics and haplotype information from an individual-level reference panel. Simulations with individual-level genotypes show that the results of HAPRAP and multiple regression are highly consistent. In simulation with summary-level data, we demonstrate that HAPRAP is less sensitive to poor LD estimates. In a parametric simulation using Genetic Investigation of ANthropometric Traits height data, HAPRAP performs well with a small training sample size (N < 2000) while other methods become suboptimal. Moreover, HAPRAP's performance is not affected substantially by single nucleotide polymorphisms (SNPs) with low minor allele frequencies. We applied the method to existing quantitative trait and binary outcome meta-analyses (human height, QTc interval and gallbladder disease); all previous reported association signals were replicated and two additional variants were independently associated with human height. Due to the growing availability of summary level data, the value of HAPRAP is likely to increase markedly for future analyses (e.g. functional prediction and identification of instruments for Mendelian randomization). The HAPRAP package and documentation are available at http://apps.biocompute.org.uk/haprap/ CONTACT: : jie.zheng@bristol.ac.uk or tom.gaunt@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  3. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

    PubMed

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q

    2016-06-08

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.

  4. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

    PubMed Central

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

    2016-01-01

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519

  5. Relationship between academic motivation and mathematics achievement among Indian adolescents in Canada and India.

    PubMed

    Areepattamannil, Shaljan

    2014-01-01

    This study examined the relationships between academic motivation-intrinsic motivation, extrinsic motivation, amotivation-and mathematics achievement among 363 Indian adolescents in India and 355 Indian immigrant adolescents in Canada. Results of hierarchical multiple regression analyses showed that intrinsic motivation, extrinsic motivation, and amotivation were not statistically significantly related to mathematics achievement among Indian adolescents in India. In contrast, both intrinsic motivation and extrinsic motivation were statistically significantly related to mathematics achievement among Indian immigrant adolescents in Canada. While intrinsic motivation was a statistically significant positive predictor of mathematics achievement among Indian immigrant adolescents in Canada, extrinsic motivation was a statistically significant negative predictor of mathematics achievement among Indian immigrant adolescents in Canada. Amotivation was not statistically significantly related to mathematics achievement among Indian immigrant adolescents in Canada. Implications of the findings for pedagogy and practice are discussed.

  6. Comment on "Cosmic-ray-driven reaction and greenhouse effect of halogenated molecules: Culprits for atmospheric ozone depletion and global climate change"

    NASA Astrophysics Data System (ADS)

    Nuccitelli, Dana; Cowtan, Kevin; Jacobs, Peter; Richardson, Mark; Way, Robert G.; Blackburn, Anne-Marie; Stolpe, Martin B.; Cook, John

    2014-04-01

    Lu (2013) (L13) argued that solar effects and anthropogenic halogenated gases can explain most of the observed warming of global mean surface air temperatures since 1850, with virtually no contribution from atmospheric carbon dioxide (CO2) concentrations. Here we show that this conclusion is based on assumptions about the saturation of the CO2-induced greenhouse effect that have been experimentally falsified. L13 also confuses equilibrium and transient response, and relies on data sources that have been superseeded due to known inaccuracies. Furthermore, the statistical approach of sequential linear regression artificially shifts variance onto the first predictor. L13's artificial choice of regression order and neglect of other relevant data is the fundamental cause of the incorrect main conclusion. Consideration of more modern data and a more parsimonious multiple regression model leads to contradiction with L13's statistical results. Finally, the correlation arguments in L13 are falsified by considering either the more appropriate metric of global heat accumulation, or data on longer timescales.

  7. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  8. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  9. The extraction of simple relationships in growth factor-specific multiple-input and multiple-output systems in cell-fate decisions by backward elimination PLS regression.

    PubMed

    Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya

    2013-01-01

    Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.

  10. Data article on disposition towards enhancing SMEs' performance through entrepreneurial orientations: Perspectives from a developing economy.

    PubMed

    Ibidunni, Ayodotun Stephen; Ibidunni, Oyebisi Mary; Olokundun, Maxwell Ayodele; Falola, Hezekiah Olubusayo; Salau, Odunayo Paul; Borishade, Taiye Tairat

    2018-06-01

    This article present data on the disposition of SME operators towards enhancing SMEs Performance through entrepreneurial orientations. Copies of structured questionnaire were administered to 102 SME owners/managers. Using descriptive and standard multiple regression statistical analysis, the data described how proactiveness, risk-taking and autonomy orientations significantly influenced SMEs' profitability, sales growth, customer satisfaction and new product success.

  11. Precipitation-snowmelt timing and snowmelt augmentation of large peak flow events, western Cascades, Oregon

    Treesearch

    Keith Jennings; Julia A. Jones

    2015-01-01

    This study tested multiple hydrologic mechanisms to explain snowpack dynamics in extreme rain-on-snow floods, which occur widely in the temperate and polar regions. We examined 26, 10 day large storm events over the period 1992–2012 in the H.J. Andrews Experimental Forest in western Oregon, using statistical analyses (regression, ANOVA, and wavelet coherence) of hourly...

  12. A New Mathematical Framework for Design Under Uncertainty

    DTIC Science & Technology

    2016-05-05

    blending multiple information sources via auto-regressive stochastic modeling. A computationally efficient machine learning framework is developed based on...sion and machine learning approaches; see Fig. 1. This will lead to a comprehensive description of system performance with less uncertainty than in the...Bayesian optimization of super-cavitating hy- drofoils The goal of this study is to demonstrate the capabilities of statistical learning and

  13. Effect of partition board color on mood and autonomic nervous function.

    PubMed

    Sakuragi, Sokichi; Sugiyama, Yoshiki

    2011-12-01

    The purpose of this study was to evaluate the effects of the presence or absence (control) of a partition board and its color (red, yellow, blue) on subjective mood ratings and changes in autonomic nervous system indicators induced by a video game task. The increase in the mean Profile of Mood States (POMS) Fatigue score and mean Oppressive feeling rating after the task was lowest with the blue partition board. Multiple-regression analysis identified oppressive feeling and error scores on the second half of the task as statistically significant contributors to Fatigue. While explanatory variables were limited to the physiological indices, multiple-regression analysis identified a significant contribution of autonomic reactivity (assessed by heart rate variability) to Fatigue. These results suggest that a blue partition board would reduce task-induced subjective fatigue, in part by lowering the oppressive feeling of being enclosed during the task, possibly by increasing autonomic reactivity.

  14. Assessment of Communications-related Admissions Criteria in a Three-year Pharmacy Program

    PubMed Central

    Tejada, Frederick R.; Lang, Lynn A.; Purnell, Miriam; Acedera, Lisa; Ngonga, Ferdinand

    2015-01-01

    Objective. To determine if there is a correlation between TOEFL and other admissions criteria that assess communications skills (ie, PCAT variables: verbal, reading, essay, and composite), interview, and observational scores and to evaluate TOEFL and these admissions criteria as predictors of academic performance. Methods. Statistical analyses included two sample t tests, multiple regression and Pearson’s correlations for parametric variables, and Mann-Whitney U for nonparametric variables, which were conducted on the retrospective data of 162 students, 57 of whom were foreign-born. Results. The multiple regression model of the other admissions criteria on TOEFL was significant. There was no significant correlation between TOEFL scores and academic performance. However, significant correlations were found between the other admissions criteria and academic performance. Conclusion. Since TOEFL is not a significant predictor of either communication skills or academic success of foreign-born PharmD students in the program, it may be eliminated as an admissions criterion. PMID:26430273

  15. Assessment of Communications-related Admissions Criteria in a Three-year Pharmacy Program.

    PubMed

    Parmar, Jayesh R; Tejada, Frederick R; Lang, Lynn A; Purnell, Miriam; Acedera, Lisa; Ngonga, Ferdinand

    2015-08-25

    To determine if there is a correlation between TOEFL and other admissions criteria that assess communications skills (ie, PCAT variables: verbal, reading, essay, and composite), interview, and observational scores and to evaluate TOEFL and these admissions criteria as predictors of academic performance. Statistical analyses included two sample t tests, multiple regression and Pearson's correlations for parametric variables, and Mann-Whitney U for nonparametric variables, which were conducted on the retrospective data of 162 students, 57 of whom were foreign-born. The multiple regression model of the other admissions criteria on TOEFL was significant. There was no significant correlation between TOEFL scores and academic performance. However, significant correlations were found between the other admissions criteria and academic performance. Since TOEFL is not a significant predictor of either communication skills or academic success of foreign-born PharmD students in the program, it may be eliminated as an admissions criterion.

  16. Time Series Analysis of Soil Radon Data Using Multiple Linear Regression and Artificial Neural Network in Seismic Precursory Studies

    NASA Astrophysics Data System (ADS)

    Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.

    2017-07-01

    This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.

  17. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.

  18. Methods for estimating low-flow statistics for Massachusetts streams

    USGS Publications Warehouse

    Ries, Kernell G.; Friesz, Paul J.

    2000-01-01

    Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The streamgaging stations had from 2 to 81 years of record, with a mean record length of 37 years. The low-flow partial-record stations had from 8 to 36 streamflow measurements, with a median of 14 measurements. All basin characteristics were determined from digital map data. The basin characteristics that were statistically significant in most of the final regression equations were drainage area, the area of stratified-drift deposits per unit of stream length plus 0.1, mean basin slope, and an indicator variable that was 0 in the eastern region and 1 in the western region of Massachusetts. The equations were developed by use of weighted-least-squares regression analyses, with weights assigned proportional to the years of record and inversely proportional to the variances of the streamflow statistics for the stations. Standard errors of prediction ranged from 70.7 to 17.5 percent for the equations to predict the 7-day, 10-year low flow and 50-percent duration flow, respectively. The equations are not applicable for use in the Southeast Coastal region of the State, or where basin characteristics for the selected ungaged site are outside the ranges of those for the stations used in the regression analyses. A World Wide Web application was developed that provides streamflow statistics for data collection stations from a data base and for ungaged sites by measuring the necessary basin characteristics for the site and solving the regression equations. Output provided by the Web application for ungaged sites includes a map of the drainage-basin boundary determined for the site, the measured basin characteristics, the estimated streamflow statistics, and 90-percent prediction intervals for the estimates. An equation is provided for combining regression and correlation estimates to obtain improved estimates of the streamflow statistics for low-flow partial-record stations. An equation is also provided for combining regression and drainage-area ratio estimates to obtain improved e

  19. Statistical research using the multiple regression analysis in areas of the cast hipereutectoid steel rolls manufacturing

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Alexa, V.; Serban, S.; Rackov, M.; Čavić, M.

    2018-01-01

    The cast hipereutectoid steel (usually named Adamite) is a roll manufacturing destined material, having mechanical, chemical properties and Carbon [C] content of which stands between steelandiron, along-withitsalloyelements such as Nickel [Ni], Chrome [Cr], Molybdenum [Mo] and/or other alloy elements. Adamite Rolls are basically alloy steel rolls (a kind of high carbon steel) having hardness ranging from 40 to 55 degrees Shore C, with Carbon [C] percentage ranging from 1.35% until to 2% (usually between 1.2˜2.3%), the extra Carbon [C] and the special alloying element giving an extra wear resistance and strength. First of all the Adamite roll’s prominent feature is the small variation in hardness of the working surface, and has a good abrasion resistance and bite performance. This paper reviews key aspects of roll material properties and presents an analysis of the influences of chemical composition upon the mechanical properties (hardness) of the cast hipereutectoid steel rolls (Adamite). Using the multiple regression analysis (the double and triple regression equations), some mathematical correlations between the cast hipereutectoid steel rolls’ chemical composition and the obtained hardness are presented. In this work several results and evidence obtained by actual experiments are presented. Thus, several variation boundaries for the chemical composition of cast hipereutectoid steel rolls, in view the obtaining the proper values of the hardness, are revealed. For the multiple regression equations, correlation coefficients and graphical representations the software Matlab was used.

  20. Calculating stage duration statistics in multistage diseases.

    PubMed

    Komarova, Natalia L; Thalhauser, Craig J

    2011-01-01

    Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.

  1. Normality of raw data in general linear models: The most widespread myth in statistics

    USGS Publications Warehouse

    Kery, Marc; Hatfield, Jeff S.

    2003-01-01

    In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.

  2. Ergonomics study on mobile phones for thumb physiology discomfort

    NASA Astrophysics Data System (ADS)

    Bendero, J. M. S.; Doon, M. E. R.; Quiogue, K. C. A.; Soneja, L. C.; Ong, N. R.; Sauli, Z.; Vairavan, R.

    2017-09-01

    The study was conducted on Filipino undergraduate college students and aimed to find out about the significant factors associated with mobile phone usage and its effect on thumb pain.A correlation-prediction analysisand Multiple Linear Regression was adopted and used as the main tool in determining the significant factors and coming up with predictive models on thumb related pain. With the use of the software Statistical Package for the Social Sciences or SPSS in conducting linear regression, 2 significant factors on thumb-related pain (percentage of time using portrait as screen orientation when text messaging, amount of time playing games using one hand in a day) were found.

  3. Functional capacity following univentricular repair--midterm outcome.

    PubMed

    Sen, Supratim; Bandyopadhyay, Biswajit; Eriksson, Peter; Chattopadhyay, Amitabha

    2012-01-01

    Previous studies have seldom compared functional capacity in children following Fontan procedure alongside those with Glenn operation as destination therapy. We hypothesized that Fontan circulation enables better midterm submaximal exercise capacity as compared to Glenn physiology and evaluated this using the 6-minute walk test. Fifty-seven children aged 5-18 years with Glenn (44) or Fontan (13) operations were evaluated with standard 6-minute walk protocols. Baseline SpO(2) was significantly lower in Glenn patients younger than 10 years compared to Fontan counterparts and similar in the two groups in older children. Postexercise SpO(2) fell significantly in Glenn patients compared to the Fontan group. There was no statistically significant difference in baseline, postexercise, or postrecovery heart rates (HRs), or 6-minute walk distances in the two groups. Multiple regression analysis revealed lower resting HR, higher resting SpO(2) , and younger age at latest operation to be significant determinants of longer 6-minute walk distance. Multiple regression analysis also established that younger age at operation, higher resting SpO(2) , Fontan operation, lower resting HR, and lower postexercise HR were significant determinants of higher postexercise SpO(2) . Younger age at operation and exercise, lower resting HR and postexercise HR, higher resting SpO(2) and postexercise SpO(2) , and dominant ventricular morphology being left ventricular or indeterminate/mixed had significant association with better 6-minute work on multiple regression analysis. Lower resting HR had linear association with longer 6-minute walk distances in the Glenn patients. Compared to Glenn physiology, Fontan operation did not have better submaximal exercise capacity assessed by walk distance or work on multiple regression analysis. Lower resting HR, higher resting SpO(2) , and younger age at operation were factors uniformly associated with better submaximal exercise capacity. © 2012 Wiley Periodicals, Inc.

  4. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    PubMed

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  5. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations.

    PubMed

    Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel

    2016-12-01

    The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.

  6. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

    PubMed

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun

    2016-05-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). © 2016 WILEY PERIODICALS, INC.

  7. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  8. Dynamics and regulation of the southern brook trout (Salvelinus fontinalis) population in an Appalachian stream

    Treesearch

    Gary D. Grossman; Robert E. Ratajczak; C. Michael Wagner; J. Todd Petty

    2010-01-01

    1. We used information theoretic statistics [Akaike’s Information Criterion (AIC)] and regression analysis in a multiple hypothesis testing approach to assess the processes capable of explaining long-term demographic variation in a lightly exploited brook trout population in Ball Creek, NC. We sampled a 100-m-long second-order site during both spring and autumn 1991–...

  9. Effects of Coaching on Standardized Admission Examinations. Revised Statistical Analyses of Data Gathered By Boston Regional Office of the Federal Trade Commission.

    ERIC Educational Resources Information Center

    Federal Trade Commission, Washington, DC. Bureau of Consumer Protection.

    The effect of commercial coaching on Scholastic Aptitude Test (SAT) scores was analyzed, using 1974-1977 test results of 2,500 non-coached students and 1,568 enrollees in two coaching schools. (The Stanley H. Kaplan Educational Center, Inc., and the Test Preparation Center, Inc.). Multiple regression analysis was used to control for student…

  10. Catalog of Air Force Weather Technical Documents, 1941-2006

    DTIC Science & Technology

    2006-05-19

    radiosondes in current use in USA. Elementary discussion of statistical terms and concepts used for expressing accuracy or error is discussed. AWS TR 105...Techniques, Appendix B: Vorticity—An Elementary Discussion of the Concept, August 1956, 27pp. Formerly AWSM 105– 50/1A. Provides the necessary back...steps involved in ordinary multiple linear regression. Conditional probability is calculated using transnormalized variables in the multivariate normal

  11. Prediction system of hydroponic plant growth and development using algorithm Fuzzy Mamdani method

    NASA Astrophysics Data System (ADS)

    Sudana, I. Made; Purnawirawan, Okta; Arief, Ulfa Mediaty

    2017-03-01

    Hydroponics is a method of farming without soil. One of the Hydroponic plants is Watercress (Nasturtium Officinale). The development and growth process of hydroponic Watercress was influenced by levels of nutrients, acidity and temperature. The independent variables can be used as input variable system to predict the value level of plants growth and development. The prediction system is using Fuzzy Algorithm Mamdani method. This system was built to implement the function of Fuzzy Inference System (Fuzzy Inference System/FIS) as a part of the Fuzzy Logic Toolbox (FLT) by using MATLAB R2007b. FIS is a computing system that works on the principle of fuzzy reasoning which is similar to humans' reasoning. Basically FIS consists of four units which are fuzzification unit, fuzzy logic reasoning unit, base knowledge unit and defuzzification unit. In addition to know the effect of independent variables on the plants growth and development that can be visualized with the function diagram of FIS output surface that is shaped three-dimensional, and statistical tests based on the data from the prediction system using multiple linear regression method, which includes multiple linear regression analysis, T test, F test, the coefficient of determination and donations predictor that are calculated using SPSS (Statistical Product and Service Solutions) software applications.

  12. Trend analysis of the long-term Swiss ozone measurements

    NASA Technical Reports Server (NTRS)

    Staehelin, Johannes; Bader, Juerg; Gelpke, Verena

    1994-01-01

    Trend analyses, assuming a linear trend which started at 1970, were performed from total ozone measurements from Arosa (Switzerland, 1926-1991). Decreases in monthly mean values were statistically significant for October through April showing decreases of about 2.0-4 percent per decade. For the period 1947-91, total ozone trends were further investigated using a multiple regression model. Temperature of a mountain peak in Switzerland (Mt. Santis), the F10.7 solar flux series, the QBO series (quasi biennial oscillation), and the southern oscillation index (SOI) were included as explanatory variables. Trends in the monthly mean values were statistically significant for December through April. The same multiple regression model was applied to investigate the ozone trends at various altitudes using the ozone balloon soundings from Payerne (1967-1989) and the Umkehr measurements from Arosa (1947-1989). The results show four different vertical trend regimes: On a relative scale changes were largest in the troposphere (increase of about 10 percent per decade). On an absolute scale the largest trends were obtained in the lower stratosphere (decrease of approximately 6 per decade at an altitude of about 18 to 22 km). No significant trends were observed at approximately 30 km, whereas stratospheric ozone decreased in the upper stratosphere.

  13. Depression in non-Korean women residing in South Korea following marriage to Korean men.

    PubMed

    Kim, Hyun-Sil; Kim, Hun-Soo

    2013-06-01

    The purpose of the study was to examine the roles of acculturative stress, life satisfaction, and language literacy in depression in non-Korean women residing in South Korea following marriage to Korean men. A cross-sectional study was performed, using an anonymous, self-reporting questionnaire. A total of 173 women were selected using a proportional stratified random sampling method. The relation between acculturation, depression, language literacy, life satisfaction and socio-demographic variables and the predictors of depression among participants were analyzed. The analysis included descriptive statistics and hierarchical multiple regression. Of the participants, 9.2% had depression, which was almost twice the rate of depression found in the general Korean population. In hierarchical multiple regression analysis, acculturative stress (beta=-.325, P<.001) and life satisfaction (beta=-.282, P=.003) were significantly associated with the level of depression. This final model was statistically significant and life satisfaction, acculturative stress, language literacy accounted for 31.0% (adjusted R(2)) of the variance in the depression score (P<.001). Elevated acculturative stress and less life satisfaction were significantly associated with a higher level of depression in migrant wives in Korea. Implications for practice and research are discussed. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. A prospective study of differential sources of school-related social support and adolescent global life satisfaction.

    PubMed

    Siddall, James; Huebner, E Scott; Jiang, Xu

    2013-01-01

    This study examined the cross-sectional and prospective relationships between three sources of school-related social support (parent involvement, peer support for learning, and teacher-student relationships) and early adolescents' global life satisfaction. The participants were 597 middle school students from 1 large school in the southeastern United States who completed measures of school social climate and life satisfaction on 2 occasions, 5 months apart. The results revealed that school-related experiences in terms of social support for learning contributed substantial amounts of variance to individual differences in adolescents' satisfaction with their lives as a whole. Cross-sectional multiple regression analyses of the differential contributions of the sources of support demonstrated that family and peer support for learning contributed statistically significant, unique variance to global life satisfaction reports. Prospective multiple regression analyses demonstrated that only family support for learning continued to contribute statistically significant, unique variance to the global life satisfaction reports at Time 2. The results suggest that school-related experiences, especially family-school interactions, spill over into adolescents' overall evaluations of their lives at a time when direct parental involvement in schooling and adolescents' global life satisfaction are generally declining. Recommendations for future research and educational policies and practices are discussed. © 2013 American Orthopsychiatric Association.

  15. Changes in aerobic power of men, ages 25-70 yr

    NASA Technical Reports Server (NTRS)

    Jackson, A. S.; Beard, E. F.; Wier, L. T.; Ross, R. M.; Stuteville, J. E.; Blair, S. N.

    1995-01-01

    This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak). The cross-sectional sample consisted of 1,499 healthy men ages 25-70 yr. The 156 men of the longitudinal sample were from the same population and examined twice, the mean time between tests was 4.1 (+/- 1.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill exercise test. The zero-order correlations between VO2peak and %fat (r = -0.62) and SR-PA (r = 0.58) were significantly (P < 0.05) higher that the age correlation (r = -0.45). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.46 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.79) showed that nearly 50% of this cross-sectional decline was due to %fat and SR-PA, adding these lifestyle variables to the multiple regression model reduced the age regression weight to -0.26 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results.

  16. Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

    PubMed

    Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

    2012-02-01

    In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.

  17. Species Composition at the Sub-Meter Level in Discontinuous Permafrost in Subarctic Sweden

    NASA Astrophysics Data System (ADS)

    Anderson, S. M.; Palace, M. W.; Layne, M.; Varner, R. K.; Crill, P. M.

    2013-12-01

    Northern latitudes are experiencing rapid warming. Wetlands underlain by permafrost are particularly vulnerable to warming which results in changes in vegetative cover. Specific species have been associated with greenhouse gas emissions therefore knowledge of species compositional shift allows for the systematic change and quantification of emissions and changes in such emissions. Species composition varies on the sub-meter scale based on topography and other microsite environmental parameters. This complexity and the need to scale vegetation to the landscape level proves vital in our estimation of carbon dioxide (CO2) and methane (CH4) emissions and dynamics. Stordalen Mire (68°21'N, 18°49'E) in Abisko and is located at the edge of discontinuous permafrost zone. This provides a unique opportunity to analyze multiple vegetation communities in a close proximity. To do this, we randomly selected 25 1x1 meter plots that were representative of five major cover types: Semi-wet, wet, hummock, tall graminoid, and tall shrub. We used a quadrat with 64 sub plots and measured areal percent cover for 24 species. We collected ground based remote sensing (RS) at each plot to determine species composition using an ADC-lite (near infrared, red, green) and GoPro (red, blue, green). We normalized each image based on a Teflon white chip placed in each image. Textural analysis was conducted on each image for entropy, angular second momentum, and lacunarity. A logistic regression was developed to examine vegetation cover types and remote sensing parameters. We used a multiple linear regression using forwards stepwise variable selection. We found statistical difference in species composition and diversity indices between vegetation cover types. In addition, we were able to build regression model to significantly estimate vegetation cover type as well as percent cover for specific key vegetative species. This ground-based remote sensing allows for quick quantification of vegetation cover and species and also provides the framework for scaling to satellite image data to estimate species composition and shift on the landscape level. To determine diversity within our plots we calculated species richness and Shannon Index. We found that there were statistically different species composition within each vegetation cover type and also determined which species were indicative for cover type. Our logistical regression was able to significantly classify vegetation cover types based on RS parameters. Our multiple regression analysis indicated Betunla nana (Dwarf Birch) (r2= .48, p=<0.0001) and Sphagnum (r2=0.59, p=<0.0001) were statistically significant with respect to RS parameters. We suggest that ground based remote sensing methods may provide a unique and efficient method to quantify vegetation across the landscape in northern latitude wetlands.

  18. Stature estimation from the lengths of the growing foot-a study on North Indian adolescents.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam; DiMaggio, John A

    2012-12-01

    Stature estimation is considered as one of the basic parameters of the investigation process in unknown and commingled human remains in medico-legal case work. Race, age and sex are the other parameters which help in this process. Stature estimation is of the utmost importance as it completes the biological profile of a person along with the other three parameters of identification. The present research is intended to formulate standards for stature estimation from foot dimensions in adolescent males from North India and study the pattern of foot growth during the growing years. 154 male adolescents from the Northern part of India were included in the study. Besides stature, five anthropometric measurements that included the length of the foot from each toe (T1, T2, T3, T4, and T5 respectively) to pternion were measured on each foot. The data was analyzed statistically using Student's t-test, Pearson's correlation, linear and multiple regression analysis for estimation of stature and growth of foot during ages 13-18 years. Correlation coefficients between stature and all the foot measurements were found to be highly significant and positively correlated. Linear regression models and multiple regression models (with age as a co-variable) were derived for estimation of stature from the different measurements of the foot. Multiple regression models (with age as a co-variable) estimate stature with greater accuracy than the regression models for 13-18 years age group. The study shows the growth pattern of feet in North Indian adolescents and indicates that anthropometric measurements of the foot and its segments are valuable in estimation of stature in growing individuals of that population. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Predicting perceptual quality of images in realistic scenario using deep filter banks

    NASA Astrophysics Data System (ADS)

    Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

    2018-03-01

    Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.

  20. Data Mining CMMSs: How to Convert Data into Knowledge.

    PubMed

    Fennigkoh, Larry; Nanney, D Courtney

    2018-01-01

    Although the healthcare technology management (HTM) community has decades of accumulated medical device-related maintenance data, little knowledge has been gleaned from these data. Finding and extracting such knowledge requires the use of the well-established, but admittedly somewhat foreign to HTM, application of inferential statistics. This article sought to provide a basic background on inferential statistics and describe a case study of their application, limitations, and proper interpretation. The research question associated with this case study involved examining the effects of ventilator preventive maintenance (PM) labor hours, age, and manufacturer on needed unscheduled corrective maintenance (CM) labor hours. The study sample included more than 21,000 combined PM inspections and CM work orders on 2,045 ventilators from 26 manufacturers during a five-year period (2012-16). A multiple regression analysis revealed that device age, manufacturer, and accumulated PM inspection labor hours all influenced the amount of CM labor significantly (P < 0.001). In essence, CM labor hours increased with increasing PM labor. However, and despite the statistical significance of these predictors, the regression analysis also indicated that ventilator age, manufacturer, and PM labor hours only explained approximately 16% of all variability in CM labor, with the remainder (84%) caused by other factors that were not included in the study. As such, the regression model obtained here is not suitable for predicting ventilator CM labor hours.

  1. Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

    PubMed Central

    Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

    2006-01-01

    In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709

  2. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.

  3. Association of Emotional Labor and Occupational Stressors with Depressive Symptoms among Women Sales Workers at a Clothing Shopping Mall in the Republic of Korea: A Cross-Sectional Study

    PubMed Central

    Chung, Yuh-Jin; Jung, Woo-Chul

    2017-01-01

    In the distribution service industry, sales people often experience multiple occupational stressors such as excessive emotional labor, workplace mistreatment, and job insecurity. The present study aimed to explore the associations of these stressors with depressive symptoms among women sales workers at a clothing shopping mall in Korea. A cross sectional study was conducted on 583 women who consist of clothing sales workers and manual workers using a structured questionnaire to assess demographic factors, occupational stressors, and depressive symptoms. Multiple regression analyses were performed to explore the association of these stressors with depressive symptoms. Scores for job stress subscales such as job demand, job control, and job insecurity were higher among sales workers than among manual workers (p < 0.01). The multiple regression analysis revealed the association between occupation and depressive symptoms after controlling for age, educational level, cohabiting status, and occupational stressors (sβ = 0.08, p = 0.04). A significant interaction effect between occupation and social support was also observed in this model (sβ = −0.09, p = 0.02). The multiple regression analysis stratified by occupation showed that job demand, job insecurity, and workplace mistreatment were significantly associated with depressive symptoms in both occupations (p < 0.05), although the strength of statistical associations were slightly different. We found negative associations of social support (sβ = −0.22, p < 0.01) and emotional effort (sβ = −0.17, p < 0.01) with depressive symptoms in another multiple regression model for sales workers. Emotional dissonance (sβ = 0.23, p < 0.01) showed positive association with depressive symptoms in this model. The result of this study indicated that reducing occupational stressors would be effective for women sales workers to prevent depressive symptoms. In particular, promoting social support could be the most effective way to promote women sales workers’ mental health. PMID:29168777

  4. Association of Emotional Labor and Occupational Stressors with Depressive Symptoms among Women Sales Workers at a Clothing Shopping Mall in the Republic of Korea: A Cross-Sectional Study.

    PubMed

    Chung, Yuh-Jin; Jung, Woo-Chul; Kim, Hyunjoo; Cho, Seong-Sik

    2017-11-23

    In the distribution service industry, sales people often experience multiple occupational stressors such as excessive emotional labor, workplace mistreatment, and job insecurity. The present study aimed to explore the associations of these stressors with depressive symptoms among women sales workers at a clothing shopping mall in Korea. A cross sectional study was conducted on 583 women who consist of clothing sales workers and manual workers using a structured questionnaire to assess demographic factors, occupational stressors, and depressive symptoms. Multiple regression analyses were performed to explore the association of these stressors with depressive symptoms. Scores for job stress subscales such as job demand, job control, and job insecurity were higher among sales workers than among manual workers ( p < 0.01). The multiple regression analysis revealed the association between occupation and depressive symptoms after controlling for age, educational level, cohabiting status, and occupational stressors (sβ = 0.08, p = 0.04). A significant interaction effect between occupation and social support was also observed in this model (sβ = -0.09, p = 0.02). The multiple regression analysis stratified by occupation showed that job demand, job insecurity, and workplace mistreatment were significantly associated with depressive symptoms in both occupations ( p < 0.05), although the strength of statistical associations were slightly different. We found negative associations of social support (sβ = -0.22, p < 0.01) and emotional effort (sβ = -0.17, p < 0.01) with depressive symptoms in another multiple regression model for sales workers. Emotional dissonance (sβ = 0.23, p < 0.01) showed positive association with depressive symptoms in this model. The result of this study indicated that reducing occupational stressors would be effective for women sales workers to prevent depressive symptoms. In particular, promoting social support could be the most effective way to promote women sales workers' mental health.

  5. Upper extremity disorders in heavy industry workers in Greece.

    PubMed

    Tsouvaltzidou, Thomaella; Alexopoulos, Evangelos; Fragkakis, Ioannis; Jelastopulu, Eleni

    2017-06-18

    To investigate the disability due to musculoskeletal disorders of the upper extremities in heavy industry workers. The population under study consisted of 802 employees, both white- and blue-collar, working in a shipyard industry in Athens, Greece. Data were collected through the distribution of questionnaires and the recording of individual and job-related characteristics during the period 2006-2009. The questionnaires used were the Quick Disabilities of the Arm, Shoulder and Hand (QD) Outcome Measure, the Work Ability Index (WAI) and the Short-Form-36 (SF-36) Health Survey. The QD was divided into three parameters - movement restrictions in everyday activities, work and sports/music activities - and the SF-36 into two items, physical and emotional. Multiple linear regression analysis was performed by means of the SPSS v.22 for Windows Statistical Package. The answers given by the participants for the QD did not reveal great discomfort regarding the execution of manual tasks, with the majority of the participants scoring under 5%, meaning no disability. After conducting multiple linear regression, age revealed a positive association with the parameter of restrictions in everyday activities (b = 0.64, P = 0.000). Basic education showed a statistically significant association regarding restrictions during leisure activities, with b = 2.140 ( P = 0.029) for compulsory education graduates. WAI's final score displayed negative charging in the regression analysis of all three parameters, with b = -0.142 ( P = 0.0), b = -0.099 ( P = 0.055) and b = -0.376 ( P = 0.001) respectively, while the physical and emotional components of SF-36 associated with movement restrictions only in daily activities and work. The participants' specialty made no statistically significant associations with any of the three parameters of the QD. Increased musculoskeletal disorders of the upper extremity are associated with older age, lower basic education and physical and mental/emotional health and reduced working ability.

  6. Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

    PubMed

    Webster, R J; Williams, A; Marchetti, F; Yauk, C L

    2018-07-01

    Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  7. Estimates of Flow Duration, Mean Flow, and Peak-Discharge Frequency Values for Kansas Stream Locations

    USGS Publications Warehouse

    Perry, Charles A.; Wolock, David M.; Artman, Joshua C.

    2004-01-01

    Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.

  8. Spatial analysis of relative humidity during ungauged periods in a mountainous region

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Kim, Yeonjoo

    2017-08-01

    Although atmospheric humidity influences environmental and agricultural conditions, thereby influencing plant growth, human health, and air pollution, efforts to develop spatial maps of atmospheric humidity using statistical approaches have thus far been limited. This study therefore aims to develop statistical approaches for inferring the spatial distribution of relative humidity (RH) for a mountainous island, for which data are not uniformly available across the region. A multiple regression analysis based on various mathematical models was used to identify the optimal model for estimating monthly RH by incorporating not only temperature but also location and elevation. Based on the regression analysis, we extended the monthly RH data from weather stations to cover the ungauged periods when no RH observations were available. Then, two different types of station-based data, the observational data and the data extended via the regression model, were used to form grid-based data with a resolution of 100 m. The grid-based data that used the extended station-based data captured the increasing RH trend along an elevation gradient. Furthermore, annual RH values averaged over the regions were examined. Decreasing temporal trends were found in most cases, with magnitudes varying based on the season and region.

  9. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  10. Application of factor analysis of infrared spectra for quantitative determination of beta-tricalcium phosphate in calcium hydroxylapatite.

    PubMed

    Arsenyev, P A; Trezvov, V V; Saratovskaya, N V

    1997-01-01

    This work represents a method, which allows to determine phase composition of calcium hydroxylapatite basing on its infrared spectrum. The method uses factor analysis of the spectral data of calibration set of samples to determine minimal number of factors required to reproduce the spectra within experimental error. Multiple linear regression is applied to establish correlation between factor scores of calibration standards and their properties. The regression equations can be used to predict the property value of unknown sample. The regression model was built for determination of beta-tricalcium phosphate content in hydroxylapatite. Statistical estimation of quality of the model was carried out. Application of the factor analysis on spectral data allows to increase accuracy of beta-tricalcium phosphate determination and expand the range of determination towards its less concentration. Reproducibility of results is retained.

  11. Estimating peak-flow frequency statistics for selected gaged and ungaged sites in naturally flowing streams and rivers in Idaho

    USGS Publications Warehouse

    Wood, Molly S.; Fosness, Ryan L.; Skinner, Kenneth D.; Veilleux, Andrea G.

    2016-06-27

    The U.S. Geological Survey, in cooperation with the Idaho Transportation Department, updated regional regression equations to estimate peak-flow statistics at ungaged sites on Idaho streams using recent streamflow (flow) data and new statistical techniques. Peak-flow statistics with 80-, 67-, 50-, 43-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities (1.25-, 1.50-, 2.00-, 2.33-, 5.00-, 10.0-, 25.0-, 50.0-, 100-, 200-, and 500-year recurrence intervals, respectively) were estimated for 192 streamgages in Idaho and bordering States with at least 10 years of annual peak-flow record through water year 2013. The streamgages were selected from drainage basins with little or no flow diversion or regulation. The peak-flow statistics were estimated by fitting a log-Pearson type III distribution to records of annual peak flows and applying two additional statistical methods: (1) the Expected Moments Algorithm to help describe uncertainty in annual peak flows and to better represent missing and historical record; and (2) the generalized Multiple Grubbs Beck Test to screen out potentially influential low outliers and to better fit the upper end of the peak-flow distribution. Additionally, a new regional skew was estimated for the Pacific Northwest and used to weight at-station skew at most streamgages. The streamgages were grouped into six regions (numbered 1_2, 3, 4, 5, 6_8, and 7, to maintain consistency in region numbering with a previous study), and the estimated peak-flow statistics were related to basin and climatic characteristics to develop regional regression equations using a generalized least squares procedure. Four out of 24 evaluated basin and climatic characteristics were selected for use in the final regional peak-flow regression equations.Overall, the standard error of prediction for the regional peak-flow regression equations ranged from 22 to 132 percent. Among all regions, regression model fit was best for region 4 in west-central Idaho (average standard error of prediction=46.4 percent; pseudo-R2>92 percent) and region 5 in central Idaho (average standard error of prediction=30.3 percent; pseudo-R2>95 percent). Regression model fit was poor for region 7 in southern Idaho (average standard error of prediction=103 percent; pseudo-R2<78 percent) compared to other regions because few streamgages in region 7 met the criteria for inclusion in the study, and the region’s semi-arid climate and associated variability in precipitation patterns causes substantial variability in peak flows.A drainage area ratio-adjustment method, using ratio exponents estimated using generalized least-squares regression, was presented as an alternative to the regional regression equations if peak-flow estimates are desired at an ungaged site that is close to a streamgage selected for inclusion in this study. The alternative drainage area ratio-adjustment method is appropriate for use when the drainage area ratio between the ungaged and gaged sites is between 0.5 and 1.5.The updated regional peak-flow regression equations had lower total error (standard error of prediction) than all regression equations presented in a 1982 study and in four of six regions presented in 2002 and 2003 studies in Idaho. A more extensive streamgage screening process used in the current study resulted in fewer streamgages used in the current study than in the 1982, 2002, and 2003 studies. Fewer streamgages used and the selection of different explanatory variables were likely causes of increased error in some regions compared to previous studies, but overall, regional peak‑flow regression model fit was generally improved for Idaho. The revised statistical procedures and increased streamgage screening applied in the current study most likely resulted in a more accurate representation of natural peak-flow conditions.The updated, regional peak-flow regression equations will be integrated in the U.S. Geological Survey StreamStats program to allow users to estimate basin and climatic characteristics and peak-flow statistics at ungaged locations of interest. StreamStats estimates peak-flow statistics with quantifiable certainty only when used at sites with basin and climatic characteristics within the range of input variables used to develop the regional regression equations. Both the regional regression equations and StreamStats should be used to estimate peak-flow statistics only in naturally flowing, relatively unregulated streams without substantial local influences to flow, such as large seeps, springs, or other groundwater-surface water interactions that are not widespread or characteristic of the respective region.

  12. Markov chains and semi-Markov models in time-to-event analysis.

    PubMed

    Abner, Erin L; Charnigo, Richard J; Kryscio, Richard J

    2013-10-25

    A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields.

  13. Markov chains and semi-Markov models in time-to-event analysis

    PubMed Central

    Abner, Erin L.; Charnigo, Richard J.; Kryscio, Richard J.

    2014-01-01

    A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields. PMID:24818062

  14. Legitimate Techniques for Improving the R-Square and Related Statistics of a Multiple Regression Model

    DTIC Science & Technology

    1981-01-01

    explanatory variable has been ommitted. Ramsey (1974) has developed a rather interesting test for detecting specification errors using estimates of the...Peter. (1979) A Guide to Econometrics , Cambridge, MA: The MIT Press. Ramsey , J.B. (1974), "Classical Model Selection Through Specification Error... Tests ," in P. Zarembka, Ed. Frontiers in Econometrics , New York: Academia Press. Theil, Henri. (1971), Principles of Econometrics , New York: John Wiley

  15. Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression

    DTIC Science & Technology

    2006-03-01

    Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the

  16. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Penna, M.L.; Duchiade, M.P.

    The authors report the results of an investigation into the possible association between air pollution and infant mortality from pneumonia in the Rio de Janeiro Metropolitan Area. This investigation employed multiple linear regression analysis (stepwise method) for infant mortality from pneumonia in 1980, including the study population's areas of residence, incomes, and pollution exposure as independent variables. With the income variable included in the regression, a statistically significant association was observed between the average annual level of particulates and infant mortality from pneumonia. While this finding should be accepted with caution, it does suggest a biological association between these variables.more » The authors' conclusion is that air quality indicators should be included in studies of acute respiratory infections in developing countries.« less

  18. Digital literacy of youth and young adults with intellectual disability predicted by support needs and social maturity.

    PubMed

    Seok, Soonhwa; DaCosta, Boaventura

    2017-01-01

    This study investigated relationships between digital propensity and support needs as well as predictors of digital propensity in the context of support intensity, age, gender, and social maturity. A total of 118 special education teachers rated the support intensity, digital propensity, and social maturity of 352 students with intellectual disability. Leveraging the Digital Propensity Index, Supports Intensity Scale, and the Social Maturity Scale, descriptive statistics, correlations, multiple regressions, and regression analyses were employed. The findings revealed significant relationships between digital propensity and support needs. In addition, significant predictors of digital propensity were found with regard to support intensity, age, gender, and social maturity.

  19. Financial Management and Control for Decision Making in Urban Local Bodies in India Using Statistical Techniques

    NASA Astrophysics Data System (ADS)

    Bhattacharyya, Sidhakam; Bandyopadhyay, Gautam

    2010-10-01

    The council of most of the Urban Local Bodies (ULBs) has a limited scope for decision making in the absence of appropriate financial control mechanism. The information about expected amount of own fund during a particular period is of great importance for decision making. Therefore, in this paper, efforts are being made to present set of findings and to establish a model of estimating receipts of own sources and payments thereof using multiple regression analysis. Data for sixty months from a reputed ULB in West Bengal have been considered for ascertaining the regression models. This can be used as a part of financial management and control procedure by the council to estimate the effect on own fund. In our study we have considered two models using multiple regression analysis. "Model I" comprises of total adjusted receipt as the dependent variable and selected individual receipts as the independent variables. Similarly "Model II" consists of total adjusted payments as the dependent variable and selected individual payments as independent variables. The resultant of Model I and Model II is the surplus or deficit effecting own fund. This may be applied for decision making purpose by the council.

  20. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    PubMed

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the occurrence of any adverse event, severe adverse events, and hospital readmission. Multiple imputation is a rigorous statistical procedure that is being increasingly used to address missing values in large datasets. Using this technique for ACDF avoided the loss of cases that may have affected the representativeness and power of the study and led to different results than complete case analysis. Multiple imputation should be considered for future spine studies. Copyright © 2018 Elsevier Inc. All rights reserved.

  1. Effects of metal- and fiber-reinforced composite root canal posts on flexural properties.

    PubMed

    Kim, Su-Hyeon; Oh, Tack-Oon; Kim, Ju-Young; Park, Chun-Woong; Baek, Seung-Ho; Park, Eun-Seok

    2016-01-01

    The aim of this study was to observe the effects of different test conditions on the flexural properties of root canal post. Metal- and fiber-reinforced composite root canal posts of various diameters were measured to determine flexural properties using a threepoint bending test at different conditions. In this study, the span length/post diameter ratio of root canal posts varied from 3.0 to 10.0. Multiple regression models for maximum load as a dependent variable were statistically significant. The models for flexural properties as dependent variables were statistically significant, but linear regression models could not be fitted to data sets. At a low span length/post diameter ratio, the flexural properties were distorted by occurrence of shear stress in short samples. It was impossible to obtain high span length/post diameter ratio with root canal posts. The addition of parameters or coefficients is necessary to appropriately represent the flexural properties of root canal posts.

  2. Spatial interpolation schemes of daily precipitation for hydrologic modeling

    USGS Publications Warehouse

    Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.

    2012-01-01

    Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.

  3. Water and solute absorption from carbohydrate-electrolyte solutions in the human proximal small intestine: a review and statistical analysis.

    PubMed

    Shi, Xiaocai; Passe, Dennis H

    2010-10-01

    The purpose of this study is to summarize water, carbohydrate (CHO), and electrolyte absorption from carbohydrate-electrolyte (CHO-E) solutions based on all of the triple-lumen-perfusion studies in humans since the early 1960s. The current statistical analysis included 30 reports from which were obtained information on water absorption, CHO absorption, total solute absorption, CHO concentration, CHO type, osmolality, sodium concentration, and sodium absorption in the different gut segments during exercise and at rest. Mean differences were assessed using independent-samples t tests. Exploratory multiple-regression analyses were conducted to create prediction models for intestinal water absorption. The factors influencing water and solute absorption are carefully evaluated and extensively discussed. The authors suggest that in the human proximal small intestine, water absorption is related to both total solute and CHO absorption; osmolality exerts various impacts on water absorption in the different segments; the multiple types of CHO in the ingested CHO-E solutions play a critical role in stimulating CHO, sodium, total solute, and water absorption; CHO concentration is negatively related to water absorption; and exercise may result in greater water absorption than rest. A potential regression model for predicting water absorption is also proposed for future research and practical application. In conclusion, water absorption in the human small intestine is influenced by osmolality, solute absorption, and the anatomical structures of gut segments. Multiple types of CHO in a CHO-E solution facilitate water absorption by stimulating CHO and solute absorption and lowering osmolality in the intestinal lumen.

  4. Younger age, female sex, and high number of awakenings and arousals predict fatigue in patients with sleep disorders: a retrospective polysomnographic observational study

    PubMed Central

    Veauthier, Christian

    2013-01-01

    Background The Fatigue Severity Scale (FSS) is widely used to assess fatigue, not only in the context of multiple sclerosis-related fatigue, but also in many other medical conditions. Some polysomnographic studies have shown high FSS values in sleep-disordered patients without multiple sclerosis. The Modified Fatigue Impact Scale (MFIS) has increasingly been used in order to assess fatigue, but polysomnographic data investigating sleep-disordered patients are thus far unavailable. Moreover, the pathophysiological link between sleep architecture and fatigue measured with the MFIS and the FSS has not been previously investigated. Methods This was a retrospective observational study (n = 410) with subgroups classified according to sleep diagnosis. The statistical analysis included nonparametric correlation between questionnaire results and polysomnographic data, age and sex, and univariate and multiple logistic regression. Results The multiple logistic regression showed a significant relationship between FSS/MFIS values and younger age and female sex. Moreover, there was a significant relationship between FSS values and number of arousals and between MFIS values and number of awakenings. Conclusion Younger age, female sex, and high number of awakenings and arousals are predictive of fatigue in sleep-disordered patients. Further investigations are needed to find the pathophysiological explanation for these relationships. PMID:24109185

  5. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  6. Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants.

    PubMed

    Pierce, Brandon L; Ahsan, Habibul; Vanderweele, Tyler J

    2011-06-01

    Mendelian Randomization (MR) studies assess the causality of an exposure-disease association using genetic determinants [i.e. instrumental variables (IVs)] of the exposure. Power and IV strength requirements for MR studies using multiple genetic variants have not been explored. We simulated cohort data sets consisting of a normally distributed disease trait, a normally distributed exposure, which affects this trait and a biallelic genetic variant that affects the exposure. We estimated power to detect an effect of exposure on disease for varying allele frequencies, effect sizes and samples sizes (using two-stage least squares regression on 10,000 data sets-Stage 1 is a regression of exposure on the variant. Stage 2 is a regression of disease on the fitted exposure). Similar analyses were conducted using multiple genetic variants (5, 10, 20) as independent or combined IVs. We assessed IV strength using the first-stage F statistic. Simulations of realistic scenarios indicate that MR studies will require large (n > 1000), often very large (n > 10,000), sample sizes. In many cases, so-called 'weak IV' problems arise when using multiple variants as independent IVs (even with as few as five), resulting in biased effect estimates. Combining genetic factors into fewer IVs results in modest power decreases, but alleviates weak IV problems. Ideal methods for combining genetic factors depend upon knowledge of the genetic architecture underlying the exposure. The feasibility of well-powered, unbiased MR studies will depend upon the amount of variance in the exposure that can be explained by known genetic factors and the 'strength' of the IV set derived from these genetic factors.

  7. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  8. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    dos Santos, T. S.; Mendes, D.; Torres, R. R.

    2015-08-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

  9. SOCIAL STABILITY AND HIV RISK BEHAVIOR: EVALUATING THE ROLE OF ACCUMULATED VULNERABILITY

    PubMed Central

    German, Danielle; Latkin, Carl A.

    2011-01-01

    This study evaluated a cumulative and syndromic relationship among commonly co-occurring vulnerabilites (homelessness, incarceration, low-income, residential transition) in association with HIV-related risk behaviors among 635 low-income women in Baltimore. Analysis included descriptive statistics, logistic regression, latent class analysis and latent class regression. Both methods of assessing multidimensional instability showed significant associations with risk indicators. Risk of multiple partners, sex exchange, and drug use decreased significantly with each additional domain. Higher stability class membership (77%) was associated with decreased likelihood of multiple partners, exchange partners, recent drug use, and recent STI. Multidimensional social vulnerabilities were cumulatively and synergistically linked to HIV risk behavior. Independent instability measures may miss important contextual determinants of risk. Social stability offers a useful framework to understand the synergy of social vulnerabilities that shape sexual risk behavior. Social policies and programs aiming to enhance housing and overall social stability are likely to be beneficial for HIV prevention. PMID:21259043

  10. The severity of Minamata disease declined in 25 years: temporal profile of the neurological findings analyzed by multiple logistic regression model.

    PubMed

    Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji

    2005-01-01

    Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.

  11. Multivariate meta-analysis for non-linear and other multi-parameter associations

    PubMed Central

    Gasparrini, A; Armstrong, B; Kenward, M G

    2012-01-01

    In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043

  12. Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

    PubMed

    Demidenko, Eugene

    2017-09-01

    The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.

  13. Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology.

    PubMed

    Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta

    2017-09-19

    Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.

  14. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    PubMed

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  15. Magnitude, frequency, and trends of floods at gaged and ungaged sites in Washington, based on data through water year 2014

    USGS Publications Warehouse

    Mastin, Mark C.; Konrad, Christopher P.; Veilleux, Andrea G.; Tecca, Alison E.

    2016-09-20

    An investigation into the magnitude and frequency of floods in Washington State computed the annual exceedance probability (AEP) statistics for 648 U.S. Geological Survey unregulated streamgages in and near the borders of Washington using the recorded annual peak flows through water year 2014. This is an updated report from a previous report published in 1998 that used annual peak flows through the water year 1996. New in this report, a regional skew coefficient was developed for the Pacific Northwest region that includes areas in Oregon, Washington, Idaho and western Montana within the Columbia River drainage basin south of the United States-Canada border, the coastal areas of Oregon and western Washington, and watersheds draining into Puget Sound, Washington. The skew coefficient is an important term in the Log Pearson Type III equation used to define the distribution of the log-transformed annual peaks. The Expected Moments Algorithm was used to fit historical and censored peak-flow data to the log Pearson Type III distribution. A Multiple Grubb-Beck test was employed to censor low outliers of annual peak flows to improve on the frequency distribution. This investigation also includes a section on observed trends in annual peak flows that showed significant trends (p-value < 0.05) in 21 of 83 long-term sites, but with small magnitude Kendall tau values suggesting a limited monotonic trend in the time series of annual peaks. Most of the sites with a significant trend in western Washington were positive and all the sites with significant trends (three sites) in eastern Washington were negative.Multivariate regression analysis with measured basin characteristics and the AEP statistics at long-term, unregulated, and un-urbanized (defined as drainage basins with less than 5 percent impervious land cover for this investigation) streamgages within Washington and some in Idaho and Oregon that are near the Washington border was used to develop equations to estimate AEP statistics at ungaged basins. Washington was divided into four regions to improve the accuracy of the regression equations; a set of equations for eight selected AEPs and for each region were constructed. Selected AEP statistics included the annual peak flows that equaled or exceeded 50, 20, 10, 4, 2, 1, 0.5 and 0.2 percent of the time equivalent to peak flows for peaks with a 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year recurrence intervals, respectively. Annual precipitation and drainage area were the significant basin characteristics in the regression equations for all four regression regions in Washington and forest cover was significant for the two regression regions in eastern Washington. Average standard error of prediction for the regional regression equations ranged from 70.19 to 125.72 percent for Regression Regions 1 and 2 on the eastern side of the Cascade Mountains and from 43.22 to 58.04 percent for Regression Regions 3 and 4 on the western side of the Cascade Mountains. The pseudo coefficient of determination (where a value of 100 signifies a perfect regression model) ranged from 68.39 to 90.68 for Regression Regions 1 and 2, and 92.35 to 95.44 for Regions 3 and 4.The calculated AEP statistics for the streamgages and the regional regression equations are expected to be incorporated into StreamStats after the publication of this report. StreamStats is the interactive Web-based map tool created by the U.S. Geological Survey to allow the user to choose a streamgage and obtain published statistics or choose ungaged locations where the program automatically applies the regional regression equations and computes the estimates of the AEP statistics.

  16. Reforming the Military Health Care System

    DTIC Science & Technology

    1988-01-01

    Population Model and its Application ," International Journal of Health Services, vol. 10, no. 4 (1980). 7. "Understanding Variations in the Use of... Financial Management (November 1986), pp. 26- 34. 21. Based on the following multiple regression equation: OP/NOR= 0.51 + 0.35x(POP/NOR)-6.84x(CIV/NORxPOP) (t...Military Beneficiary Health Care Survey 95 B Actual and Expected Admission Rates 99 C The Statistical Model of Family Use 103 D The Capitation Budgeting

  17. The interaction between stratospheric monthly mean regional winds and sporadic-E

    NASA Astrophysics Data System (ADS)

    Çetin, Kenan; Özcan, Osman; Korlaelçi, Serhat

    2017-03-01

    In the present study, a statistical investigation is carried out to explore whether there is a relationship between the critical frequency (foEs) of the sporadic-E layer that is occasionally seen on the E region of the ionosphere and the quasi-biennial oscillation (QBO) that flows in the east-west direction in the equatorial stratosphere. Multiple regression model as a statistical tool was used to determine the relationship between variables. In this model, the stationarity of the variables (foEs and QBO) was firstly analyzed for each station (Cocos Island, Gibilmanna, Niue Island, and Tahiti). Then, a co-integration test was made to determine the existence of a long-term relationship between QBO and foEs. After verifying the presence of a long-term relationship between the variables, the magnitude of the relationship between variables was further determined using the multiple regression model. As a result, it is concluded that the variations in foEs were explainable with QBO measured at 10 hPa altitude at the rate of 69%, 94%, 79%, and 58% for Cocos Island, Gibilmanna, Niue Island, and Tahiti stations, respectively. It is observed that the variations in foEs were explainable with QBO measured at 70 hPa altitude at the rate of 66%, 69%, 53%, and 47% for Cocos Island, Gibilmanna, Niue Island, and Tahiti stations, respectively.

  18. Analysis of methods to estimate spring flows in a karst aquifer

    USGS Publications Warehouse

    Sepulveda, N.

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer. ?? 2008 National Ground Water Association.

  19. Analysis of methods to estimate spring flows in a karst aquifer.

    PubMed

    Sepúlveda, Nicasio

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer.

  20. Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

    PubMed

    Fu, Liya; Wang, You-Gan

    2011-02-15

    Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.

  1. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  2. The Digital Shoreline Analysis System (DSAS) Version 4.0 - An ArcGIS extension for calculating shoreline change

    USGS Publications Warehouse

    Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan

    2009-01-01

    The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.

  3. Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

    NASA Astrophysics Data System (ADS)

    Kang, Pilsang; Koo, Changhoi; Roh, Hokyu

    2017-11-01

    Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.

  4. Father and adolescent son variables related to son's HIV prevention.

    PubMed

    Glenn, Betty L; Demi, Alice; Kimble, Laura P

    2008-02-01

    The purpose of this study was to examine the relationship between fathers' influences and African American male adolescents' perceptions of self-efficacy to reduce high-risk sexual behavior. A convenience sample of 70 fathers was recruited from churches in a large metropolitan area in the South. Hierarchical multiple linear regression analysis indicated father-related factors and son-related factors were associated with 26.1% of the variance in son's self-efficacy to be abstinent. In the regression model greater son's perception of the communication of sexual standards and greater father's perception of his son's self-efficacy were significantly related to greater son's self-efficacy for abstinence. The second regression model with son's self-efficacy for safer sex as the criterion was not statistically significant. Data support the need for fathers to express confidence in their sons' ability to be abstinent or practice safer sex and to communicate with their sons regarding sexual issues and standards.

  5. A statistical methodology for estimating transport parameters: Theory and applications to one-dimensional advectivec-dispersive systems

    USGS Publications Warehouse

    Wagner, Brian J.; Gorelick, Steven M.

    1986-01-01

    A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.

  6. Impact of wearing fixed orthodontic appliances on quality of life among adolescents: Case-control study.

    PubMed

    Costa, Andréa A; Serra-Negra, Júnia M; Bendo, Cristiane B; Pordeus, Isabela A; Paiva, Saul M

    2016-01-01

    To investigate the impact of wearing a fixed orthodontic appliance on oral health-related quality of life (OHRQoL) among adolescents. A case-control study (1 ∶ 2) was carried out with a population-based randomized sample of 327 adolescents aged 11 to 14 years enrolled at public and private schools in the City of Brumadinho, southeast of Brazil. The case group (n  =  109) was made up of adolescents with a high negative impact on OHRQoL, and the control group (n  =  218) was made up of adolescents with a low negative impact. The outcome variable was the impact on OHRQoL measured by the Brazilian version of the Child Perceptions Questionnaire (CPQ 11-14) - Impact Short Form (ISF:16). The main independent variable was wearing fixed orthodontic appliances. Malocclusion and the type of school were identified as possible confounding variables. Bivariate and multiple conditional logistic regressions were employed in the statistical analysis. A multiple conditional logistic regression model demonstrated that adolescents wearing fixed orthodontic appliances had a 4.88-fold greater chance of presenting high negative impact on OHRQoL (95% CI: 2.93-8.13; P < .001) than those who did not wear fixed orthodontic appliances. A bivariate conditional logistic regression demonstrated that malocclusion was significantly associated with OHRQoL (P  =  .017), whereas no statistically significant association was found between the type of school and OHRQoL (P  =  .108). Adolescents who wore fixed orthodontic appliances had a greater chance of reporting a negative impact on OHRQoL than those who did not wear such appliances.

  7. Survival analysis in hematologic malignancies: recommendations for clinicians

    PubMed Central

    Delgado, Julio; Pereira, Arturo; Villamor, Neus; López-Guillermo, Armando; Rozman, Ciril

    2014-01-01

    The widespread availability of statistical packages has undoubtedly helped hematologists worldwide in the analysis of their data, but has also led to the inappropriate use of statistical methods. In this article, we review some basic concepts of survival analysis and also make recommendations about how and when to perform each particular test using SPSS, Stata and R. In particular, we describe a simple way of defining cut-off points for continuous variables and the appropriate and inappropriate uses of the Kaplan-Meier method and Cox proportional hazard regression models. We also provide practical advice on how to check the proportional hazards assumption and briefly review the role of relative survival and multiple imputation. PMID:25176982

  8. A twelve-year profile of students' SAT scores, GPAs, and MCAT scores from a small university's premedical program.

    PubMed

    Montague, J R; Frei, J K

    1993-04-01

    To determine whether significant correlations existed among quantitative and qualitative predictors of students' academic success and quantitative outcomes of such success over a 12-year period in a small university's premedical program. A database was assembled from information on the 199 graduates who earned BS degrees in biology from Barry University's School of Natural and Health Sciences from 1980 through 1991. The quantitative variables were year of BS degree, total score on the Scholastic Aptitude Test (SAT), various measures of undergraduate grade-point averages (GPAs), and total score on the Medical College Admission Test (MCAT); and the qualitative variables were minority (54% of the students) or majority status and transfer (about one-third of the students) or nontransfer status. The statistical methods were multiple analysis of variance and stepwise multiple regression. Statistically significant positive correlations were found among SAT total scores, final GPAs, biology GPAs versus nonbiology GPAs, and MCAT total scores. These correlations held for transfer versus nontransfer students and for minority versus majority students. Over the 12-year period there were significant fluctuations in mean MCAT scores. The students' SAT scores and GPAs proved to be statistically reliable predictors of MCAT scores, but the minority or majority status and the transfer or nontransfer status of the students were statistically insignificant.

  9. Estimating the impact of mineral aerosols on crop yields in food insecure regions using statistical crop models

    NASA Astrophysics Data System (ADS)

    Hoffman, A.; Forest, C. E.; Kemanian, A.

    2016-12-01

    A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.

  10. Chronic atrophic gastritis in association with hair mercury level.

    PubMed

    Xue, Zeyun; Xue, Huiping; Jiang, Jianlan; Lin, Bing; Zeng, Si; Huang, Xiaoyun; An, Jianfu

    2014-11-01

    The objective of this study was to explore hair mercury level in association with chronic atrophic gastritis, a precancerous stage of gastric cancer (GC), and thus provide a brand new angle of view on the timely intervention of precancerous stage of GC. We recruited 149 healthy volunteers as controls and 152 patients suffering from chronic gastritis as cases. The controls denied upper gastrointestinal discomforts, and the cases were diagnosed as chronic superficial gastritis (n=68) or chronic atrophic gastritis (n=84). We utilized Mercury Automated Analyzer (NIC MA-3000) to detect hair mercury level of both healthy controls and cases of chronic gastritis. The statistic of measurement data was expressed as mean ± standard deviation, which was analyzed using Levene variance equality test and t test. Pearson correlation analysis was employed to determine associated factors affecting hair mercury levels, and multiple stepwise regression analysis was performed to deduce regression equations. Statistical significance is considered if p value is less than 0.05. The overall hair mercury level was 0.908949 ± 0.8844490 ng/g [mean ± standard deviation (SD)] in gastritis cases and 0.460198 ± 0.2712187 ng/g (mean±SD) in healthy controls; the former level was significantly higher than the latter one (p=0.000<0.01). The hair mercury level in chronic atrophic gastritis subgroup was 1.155220 ± 0.9470246 ng/g (mean ± SD) and that in chronic superficial gastritis subgroup was 0.604732 ± 0.6942509 ng/g (mean ± SD); the former level was significantly higher than the latter level (p<0.01). The hair mercury level in chronic superficial gastritis cases was significantly higher than that in healthy controls (p<0.05). The hair mercury level in chronic atrophic gastritis cases was significantly higher than that in healthy controls (p<0.01). Stratified analysis indicated that the hair mercury level in healthy controls with eating seafood was significantly higher than that in healthy controls without eating seafood (p<0.01) and that the hair mercury level in chronic atrophic gastritis cases was significantly higher than that in chronic superficial gastritis cases (p<0.01). Pearson correlation analysis indicated that eating seafood was most correlated with hair mercury level and positively correlated in the healthy controls and that the severity of gastritis was most correlated with hair mercury level and positively correlated in the gastritis cases. Multiple stepwise regression analysis indicated that the regression equation of hair mercury level in controls could be expressed as 0.262 multiplied the value of eating seafood plus 0.434, the model that was statistically significant (p<0.01). Multiple stepwise regression analysis also indicated that the regression equation of hair mercury level in gastritis cases could be expressed as 0.305 multiplied the severity of gastritis, the model that was also statistically significant (p<0.01). The graphs of regression standardized residual for both controls and cases conformed to normal distribution. The main positively correlated factor affecting the hair mercury level is eating seafood in healthy people whereas the predominant positively correlated factor affecting the hair mercury level is the severity of gastritis in chronic gastritis patients. That is to say, the severity of chronic gastritis is positively correlated with the level of hair mercury. The incessantly increased level of hair mercury possibly reflects the development of gastritis from normal stomach to superficial gastritis and to atrophic gastritis. The detection of hair mercury is potentially a means to predict the severity of chronic gastritis and possibly to insinuate the environmental mercury threat to human health in terms of gastritis or even carcinogenesis.

  11. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  12. Reduction of interferences in graphite furnace atomic absorption spectrometry by multiple linear regression modelling

    NASA Astrophysics Data System (ADS)

    Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto

    2000-12-01

    The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.

  13. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters

    PubMed Central

    Francoeur, Richard B

    2015-01-01

    Background The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Materials and methods Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Results Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain–fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain–fatigue/weakness–sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. Conclusion By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial. PMID:25565865

  14. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters.

    PubMed

    Francoeur, Richard B

    2015-01-01

    The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain-fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain-fatigue/weakness-sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial.

  15. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  16. Analyzing the effect of selected control policy measures and sociodemographic factors on alcoholic beverage consumption in Europe within the AMPHORA project: statistical methods.

    PubMed

    Baccini, Michela; Carreras, Giulia

    2014-10-01

    This paper describes the methods used to investigate variations in total alcoholic beverage consumption as related to selected control intervention policies and other socioeconomic factors (unplanned factors) within 12 European countries involved in the AMPHORA project. The analysis presented several critical points: presence of missing values, strong correlation among the unplanned factors, long-term waves or trends in both the time series of alcohol consumption and the time series of the main explanatory variables. These difficulties were addressed by implementing a multiple imputation procedure for filling in missing values, then specifying for each country a multiple regression model which accounted for time trend, policy measures and a limited set of unplanned factors, selected in advance on the basis of sociological and statistical considerations are addressed. This approach allowed estimating the "net" effect of the selected control policies on alcohol consumption, but not the association between each unplanned factor and the outcome.

  17. The Impact of Fire on Active Layer Thicknes

    NASA Astrophysics Data System (ADS)

    Schaefer, K. M.; Parsekian, A.; Natali, S.; Ludwig, S.; Michaelides, R. J.; Zebker, H. A.; Chen, J.

    2016-12-01

    Fire influences permafrost thermodynamics by darkening the surface to increase solar absorption and removing insulating moss and organic soil, resulting in an increase in Active Layer Thickness (ALT). The summer of 2015 was one of the worst fire years on record in Alaska with multiple fires in the Yukon-Kuskokwim (YK) Delta. To understand the impacts of fire on permafrost, we need large-scale, extensive measurements of ALT both within and outside the fire zones. In August 2016, we surveyed ALT across multiple fire zones in the YK Delta using Ground Penetrating Radar (GPR) and mechanical probing. GPR uses pulsed, radio-frequency electromagnetic waves to noninvasively image the subsurface and is an effective tool to quickly map ALT over large areas. We supplemented this ALT data with measurements of Volumetric Water Content (VWC), Organic Layer Thickness (OLT), and burn severity. We quantified the impacts of fire by statistically comparing the measurements inside and outside the fire zones and statistically regressing ALT against VWC, change in OLT, and burn severity.

  18. Determinants of health care expenditures and the contribution of associated factors: 16 cities and provinces in Korea, 2003-2010.

    PubMed

    Han, Kimyoung; Cho, Minho; Chun, Kihong

    2013-11-01

    The purpose of this study was to classify determinants of cost increases into two categories, negotiable factors and non-negotiable factors, in order to identify the determinants of health care expenditure increases and to clarify the contribution of associated factors selected based on a literature review. The data in this analysis was from the statistical yearbooks of National Health Insurance Service, the Economic Index from Statistics Korea and regional statistical yearbooks. The unit of analysis was the annual growth rate of variables of 16 cities and provinces from 2003 to 2010. First, multiple regression was used to identify the determinants of health care expenditures. We then used hierarchical multiple regression to calculate the contribution of associated factors. The changes of coefficients (R(2)) of predictors, which were entered into this analysis step by step based on the empirical evidence of the investigator could explain the contribution of predictors to increased medical cost. Health spending was mainly associated with the proportion of the elderly population, but the Medicare Economic Index (MEI) showed an inverse association. The contribution of predictors was as follows: the proportion of elderly in the population (22.4%), gross domestic product (GDP) per capita (4.5%), MEI (-12%), and other predictors (less than 1%). As Baby Boomers enter retirement, an increasing proportion of the population aged 65 and over and the GDP will continue to increase, thus accelerating the inflation of health care expenditures and precipitating a crisis in the health insurance system. Policy makers should consider providing comprehensive health services by an accountable care organization to achieve cost savings while ensuring high-quality care.

  19. Levels of soluble TREM-1 in children with newly diagnosed type 1 diabetes and their siblings without type 1 diabetes: a Danish case-control study.

    PubMed

    Thorsen, Steffen U; Pipper, Christian B; Mortensen, Henrik B; Skogstrand, Kristin; Pociot, Flemming; Johannesen, Jesper; Svensson, Jannet

    2017-12-01

    Type 1 diabetes (T1D) is an organ-specific autoimmune disease with an increase in incidence worldwide including Denmark. The triggering receptor expressed on myeloid cells-1 (TREM-1) is a potent amplifier of pro-inflammatory responses and has been linked to autoimmunity, severe psychiatric disorders, sepsis, and cancer. Our primary hypothesis was that levels of soluble TREM-1 (sTREM-1) differed between newly diagnosed children with T1D and their siblings without T1D. Since 1996, the Danish Childhood Diabetes Register has collected data on all patients who have developed T1D before the age of 18 years. Four hundred and eighty-one patients and 478 siblings with measurements of sTREM-1-blood samples were taken within 3 months after onset-were available for statistical analyses. Sample period was from 1997 through 2005. A robust log-normal regression model was used, which takes into account that measurements are left censored and accounts for correlation within siblings from the same family. In the multiple regression model (case status, gender, age, HLA-risk, season, and period of sampling), levels of sTREM-1 were found to be significantly higher in patients (relative change [95%CI], 1.5 [1.1; 2.2],P = 0.02), but after adjustment for multiple testing our result was no longer statistically significant (P adjust = 0.1). We observed a statistical significant temporal increase in levels of sTREM-1. Our results need to be replicated by independent studies, but our study suggests that the TREM-1 pathway may have a role in T1D pathogenesis. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. [A comparison of the associations of dynapenia and sarcopenia with fear of falling in elderly diabetic patients].

    PubMed

    Ida, Satoshi; Murata, Kazuya; Ishihara, Yuki; Imataka, Kanako; Kaneko, Ryutaro; Fujiwara, Ryoko; Takahashi, Hiroka

    2017-01-01

    To comparatively investigate whether dynapenia and sarcopenia, as defined by the Asian Working Group for Sarcopenia (AWGS), are associated with fear of falling in elderly patients with diabetes. The subjects were outpatients with diabetes who were at least 65 years of age when they visited our hospital. Sarcopenia was evaluated based on the AWGS definition. The cutoff values for the appendicular skeletal mass index (multi-frequency bioelectrical impedance method), grip strength, and walking speed were, respectively, 7.0 kg/m 2 for men and 5.7 kg/m 2 for women, 26 kg for men and 18 kg for women, and ≤0.8 m/s for both men and women. Those with grip strength of less than or equal to the cutoff value were considered to have dynapenia. Fear of falling was assessed by a self-administered questionnaire survey with the Fall Efficacy Scale (FES) Japanese version. A multiple regression analysis was conducted using the FES score as a dependent variable and dynapenia or sarcopenia and moderators as explanatory variables. A total of 202 patients (male, n=127; female, n=75) were analyzed in this study. The FES scores of the patients with and without sarcopenia did not differ to a statistically significant extent in either male or female patients. The multiple regression analysis revealed a statistically significant association between dynapenia and the FES score in men (P=0.028). In elderly outpatients with diabetes, no association was found between sarcopenia and the fear of falling in either men or women. In contrast, a statistically significant association was found between dynapenia and fear of falling in men. This suggests the importance paying attention to the fear of falling when examining elderly male diabetes patients with dynapenia.

  1. Substituting values for censored data from Texas, USA, reservoirs inflated and obscured trends in analyses commonly used for water quality target development.

    PubMed

    Grantz, Erin; Haggard, Brian; Scott, J Thad

    2018-06-12

    We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.

  2. A Comparison of Variable Selection Criteria for Multiple Linear Regression: A Second Simulation Study

    DTIC Science & Technology

    1993-03-01

    statistical mathe- matics, began in the late 1800’s when Sir Francis Galton first attempted to use practical mathematical techniques to investigate the...randomly collected (sampled) many pairs of parent/child height mea- surements (data), Galton observed that for a given parent- height average, the...ty only Maximum Adjusted R2 will be discussed. However, Maximum Adjusted R’ and Minimum MSE test exactly the same 2.thing. Adjusted R is related to R

  3. Compositional data analysis for physical activity, sedentary time and sleep research.

    PubMed

    Dumuid, Dorothea; Stanford, Tyman E; Martin-Fernández, Josep-Antoni; Pedišić, Željko; Maher, Carol A; Lewis, Lucy K; Hron, Karel; Katzmarzyk, Peter T; Chaput, Jean-Philippe; Fogelholm, Mikael; Hu, Gang; Lambert, Estelle V; Maia, José; Sarmiento, Olga L; Standage, Martyn; Barreira, Tiago V; Broyles, Stephanie T; Tudor-Locke, Catrine; Tremblay, Mark S; Olds, Timothy

    2017-01-01

    The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.

  4. Reporting Practices and Use of Quantitative Methods in Canadian Journal Articles in Psychology.

    PubMed

    Counsell, Alyssa; Harlow, Lisa L

    2017-05-01

    With recent focus on the state of research in psychology, it is essential to assess the nature of the statistical methods and analyses used and reported by psychological researchers. To that end, we investigated the prevalence of different statistical procedures and the nature of statistical reporting practices in recent articles from the four major Canadian psychology journals. The majority of authors evaluated their research hypotheses through the use of analysis of variance (ANOVA), t -tests, and multiple regression. Multivariate approaches were less common. Null hypothesis significance testing remains a popular strategy, but the majority of authors reported a standardized or unstandardized effect size measure alongside their significance test results. Confidence intervals on effect sizes were infrequently employed. Many authors provided minimal details about their statistical analyses and less than a third of the articles presented on data complications such as missing data and violations of statistical assumptions. Strengths of and areas needing improvement for reporting quantitative results are highlighted. The paper concludes with recommendations for how researchers and reviewers can improve comprehension and transparency in statistical reporting.

  5. Risk factors for retinal breaks in patients with symptom of floaters.

    PubMed

    Singalavanija, Apichart; Amornrattanapan, Chutiwan; Nitiruangjarus, Kanjanee; Tongsai, Sasima

    2010-06-01

    To identify the risk factors of retinal breaks in patients with the symptom of floaters, and to determine the association between those risk factors and retinal breaks. A retrospective analytic study of 184 patients (55 males and 129 females) that included 220 eyes was conducted. Patient information such as age, symptoms (multiple floaters, flashing), duration of symptom, refractive error, history of cataract surgery, family history of retinal detachment, and complete eye examination were recorded. The patients were divided into two groups, the first group (control group) had symptoms of floaters and no retinal breaks, the second group (retinal breaks group) had symptoms of floaters with retinal breaks. Chi-square test, and the multiple logistic regression were used for statistical analysis. Two hundred twenty eyes, 175 eyes of the control group and 45 eyes of the retinal breaks group were examined and included in this study. The multiple logistic regression analysis revealed that patients with multiple floaters, and floaters and flashing increased the risk of retinal breaks to 5.8 and 4.3 times, respectively, when compared to patients with single floater or floaters alone. Lattice degeneration increased the risk of retinal breaks to 5.9 times when compared to eyes that did not have lattice degeneration. Multiple floaters, flashing and lattice degeneration are risk factors of retinal breaks in patients with symptoms of floaters. Therefore, it is important for the ophthalmologists to be aware of these risk factors and the patients at risk should have follow-up examinations.

  6. Investigation of marital satisfaction and its relationship with job stress and general health of nurses in Qazvin, Iran.

    PubMed

    Azimian, Jalil; Piran, Pegah; Jahanihashemi, Hassan; Dehghankar, Leila

    2017-04-01

    Pressures in nursing can affect family life and marital problems, disrupt common social problems, increase work-family conflicts and endanger people's general health. To determine marital satisfaction and its relationship with job stress and general health of nurses. This descriptive and cross-sectional study was done in 2015 in medical educational centers of Qazvin by using an ENRICH marital satisfaction scale and General Health and Job Stress questionnaires completed by 123 nurses. Analysis was done by SPSS version 19 using descriptive and analytical statistics (Pearson correlation, t-test, ANOVA, Chi-square, regression line, multiple regression analysis). The findings showed that 64.4% of nurses had marital satisfaction. There was significant relationship between age (p=0.03), job experience (p=0.01), age of spouse (p=0.01) and marital satisfaction. The results showed that there was a significant relationship between marital satisfaction and general health (p<0.0001). Multiple regression analysis showed that there was a significant relationship between depression (p=0.012) and anxiety (p=0.001) with marital satisfaction. Due to high levels of job stress and disorder in general health of nurses and low marital satisfaction by running health promotion programs and paying attention to its dimensions can help work and family health of nurses.

  7. Confidence intervals for distinguishing ordinal and disordinal interactions in multiple regression.

    PubMed

    Lee, Sunbok; Lei, Man-Kit; Brody, Gene H

    2015-06-01

    Distinguishing between ordinal and disordinal interaction in multiple regression is useful in testing many interesting theoretical hypotheses. Because the distinction is made based on the location of a crossover point of 2 simple regression lines, confidence intervals of the crossover point can be used to distinguish ordinal and disordinal interactions. This study examined 2 factors that need to be considered in constructing confidence intervals of the crossover point: (a) the assumption about the sampling distribution of the crossover point, and (b) the possibility of abnormally wide confidence intervals for the crossover point. A Monte Carlo simulation study was conducted to compare 6 different methods for constructing confidence intervals of the crossover point in terms of the coverage rate, the proportion of true values that fall to the left or right of the confidence intervals, and the average width of the confidence intervals. The methods include the reparameterization, delta, Fieller, basic bootstrap, percentile bootstrap, and bias-corrected accelerated bootstrap methods. The results of our Monte Carlo simulation study suggest that statistical inference using confidence intervals to distinguish ordinal and disordinal interaction requires sample sizes more than 500 to be able to provide sufficiently narrow confidence intervals to identify the location of the crossover point. (c) 2015 APA, all rights reserved).

  8. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  9. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    USGS Publications Warehouse

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  10. On the interannual oscillations in the northern temperate total ozone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krzyscin, J.W.

    1994-07-01

    The interannual variations in total ozone are studied using revised Dobson total ozone records (1961-1990) from 17 stations located within the latitude band 30 deg N - 60 deg N. To obtain the quasi-biennial oscillation (QBO), El Nino-Southern Oscillation (ENSO), and 11-year solar cycle manifestation in the `northern temperate` total ozone data, various multiple regression models are constructed by the least squares fitting to the observed ozone. The statistical relationships between the selected indices of the atmospheric variabilities and total ozone are described in the linear and nonlinear regression models. Nonlinear relationships to the predictor variables are found. That is,more » the total ozone variations are statistically modeled by nonlinear terms accounting for the coupling between QBO and ENSO, QBO and solar activity, and ENSO and solar activity. It is suggested that large reduction of total ozone values over the `northern temperate` region occurs in cold season when a strong ENSO warm event meets the west phase of the QBO during the period of high solar activity.« less

  11. Peak-flow characteristics of Wyoming streams

    USGS Publications Warehouse

    Miller, Kirk A.

    2003-01-01

    Peak-flow characteristics for unregulated streams in Wyoming are described in this report. Frequency relations for annual peak flows through water year 2000 at 364 streamflow-gaging stations in and near Wyoming were evaluated and revised or updated as needed. Analyses of historical floods, temporal trends, and generalized skew were included in the evaluation. Physical and climatic basin characteristics were determined for each gaging station using a geographic information system. Gaging stations with similar peak-flow and basin characteristics were grouped into six hydrologic regions. Regional statistical relations between peak-flow and basin characteristics were explored using multiple-regression techniques. Generalized least squares regression equations for estimating magnitudes of annual peak flows with selected recurrence intervals from 1.5 to 500 years were developed for each region. Average standard errors of estimate range from 34 to 131 percent. Average standard errors of prediction range from 35 to 135 percent. Several statistics for evaluating and comparing the errors in these estimates are described. Limitations of the equations are described. Methods for applying the regional equations for various circumstances are listed and examples are given.

  12. A quantitative study of factors influencing quality of life in rural Mexican women diagnosed with HIV.

    PubMed

    Holtz, Carol; Sowell, Richard; VanBrackle, Lewis; Velasquez, Gabriela; Hernandez-Alonso, Virginia

    2014-01-01

    This quantitative study explored the level of Quality of Life (QoL) in indigenous Mexican women and identified psychosocial factors that significantly influenced their QoL, using face-to-face interviews with 101 women accessing care in an HIV clinic in Oaxaca, Mexico. Variables included demographic characteristics, levels of depression, coping style, family functioning, HIV-related beliefs, and QoL. Descriptive statistics were used to analyze participant characteristics, and women's scores on data collection instruments. Pearson's R correlational statistics were used to determine the level of significance between study variables. Multiple regression analysis examined all variables that were significantly related to QoL. Pearson's correlational analysis of relationships between Spirituality, Educating Self about HIV, Family Functioning, Emotional Support, Physical Care, and Staying Positive demonstrated positive correlation to QoL. Stigma, depression, and avoidance coping were significantly and negatively associated with QoL. The final regression model indicated that depression and avoidance coping were the best predictor variables for QoL. Copyright © 2014 Association of Nurses in AIDS Care. Published by Elsevier Inc. All rights reserved.

  13. Applications of statistics to medical science, III. Correlation and regression.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.

  14. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  15. Approach to addressing missing data for electronic medical records and pharmacy claims data research.

    PubMed

    Bounthavong, Mark; Watanabe, Jonathan H; Sullivan, Kevin M

    2015-04-01

    The complete capture of all values for each variable of interest in pharmacy research studies remains aspirational. The absence of these possibly influential values is a common problem for pharmacist investigators. Failure to account for missing data may translate to biased study findings and conclusions. Our goal in this analysis was to apply validated statistical methods for missing data to a previously analyzed data set and compare results when missing data methods were implemented versus standard analytics that ignore missing data effects. Using data from a retrospective cohort study, the statistical method of multiple imputation was used to provide regression-based estimates of the missing values to improve available data usable for study outcomes measurement. These findings were then contrasted with a complete-case analysis that restricted estimation to subjects in the cohort that had no missing values. Odds ratios were compared to assess differences in findings of the analyses. A nonadjusted regression analysis ("crude analysis") was also performed as a reference for potential bias. Veterans Integrated Systems Network that includes VA facilities in the Southern California and Nevada regions. New statin users between November 30, 2006, and December 2, 2007, with a diagnosis of dyslipidemia. We compared the odds ratios (ORs) and 95% confidence intervals (CIs) for the crude, complete-case, and multiple imputation analyses for the end points of a 25% or greater reduction in atherogenic lipids. Data were missing for 21.5% of identified patients (1665 subjects of 7739). Regression model results were similar for the crude, complete-case, and multiple imputation analyses with overlap of 95% confidence limits at each end point. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in low-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.3 (95% CI 3.8-4.9), and 4.1 (95% CI 3.7-4.6), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in non-high-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.5 (95% CI 4.0-5.2), and 4.4 (95% CI 3.9-4.9), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for 25% or greater reduction in TGs were 3.1 (95% CI 2.8-3.6), 4.0 (95% CI 3.5-4.6), and 4.1 (95% CI 3.6-4.6), respectively. The use of the multiple imputation method to account for missing data did not alter conclusions based on a complete-case analysis. Given the frequency of missing data in research using electronic health records and pharmacy claims data, multiple imputation may play an important role in the validation of study findings. © 2015 Pharmacotherapy Publications, Inc.

  16. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  17. Drug treatment rates with beta-blockers and ACE-inhibitors/angiotensin receptor blockers and recurrences in takotsubo cardiomyopathy: A meta-regression analysis.

    PubMed

    Brunetti, Natale Daniele; Santoro, Francesco; De Gennaro, Luisa; Correale, Michele; Gaglione, Antonio; Di Biase, Matteo

    2016-07-01

    In a recent paper Singh et al. analyzed the effect of drug treatment on recurrence of takotsubo cardiomyopathy (TTC) in a comprehensive meta-analysis. The study found that recurrence rates were independent of clinic utilization of BB prescription, but inversely correlated with ACEi/ARB prescription: authors therefore conclude that ACEi/ARB rather than BB may reduce risk of recurrence. We aimed to re-analyze data reported in the study, now weighted for populations' size, in a meta-regression analysis. After multiple meta-regression analysis, we found a significant regression between rates of prescription of ACEi and rates of recurrence of TTC; regression was not statistically significant for BBs. On the bases of our re-analysis, we confirm that rates of recurrence of TTC are lower in populations of patients with higher rates of treatment with ACEi/ARB. That could not necessarily imply that ACEi may prevent recurrence of TTC, but barely that, for example, rates of recurrence are lower in cohorts more compliant with therapy or more prescribed with ACEi because more carefully followed. Randomized prospective studies are surely warranted. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    PubMed

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  19. Exposure to Pre- and Perinatal Risk Factors Partially Explains Mean Differences in Self-Regulation between Races.

    PubMed

    Barnes, J C; Boutwell, Brian B; Miller, J Mitchell; DeShay, Rashaan A; Beaver, Kevin M; White, Norman

    2016-01-01

    To examine whether differential exposure to pre- and perinatal risk factors explained differences in levels of self-regulation between children of different races (White, Black, Hispanic, Asian, and Other). Multiple regression models based on data from the Early Childhood Longitudinal Study, Birth Cohort (n ≈ 9,850) were used to analyze the impact of pre- and perinatal risk factors on the development of self-regulation at age 2 years. Racial differences in levels of self-regulation were observed. Racial differences were also observed for 9 of the 12 pre-/perinatal risk factors. Multiple regression analyses revealed that a portion of the racial differences in self-regulation was explained by differential exposure to several of the pre-/perinatal risk factors. Specifically, maternal age at childbirth, gestational timing, and the family's socioeconomic status were significantly related to the child's level of self-regulation. These factors accounted for a statistically significant portion of the racial differences observed in self-regulation. The findings indicate racial differences in self-regulation may be, at least partially, explained by racial differences in exposure to pre- and perinatal risk factors.

  20. Artificial neural networks environmental forecasting in comparison with multiple linear regression technique: From heavy metals to organic micropollutants screening in agricultural soils

    NASA Astrophysics Data System (ADS)

    Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea

    2016-12-01

    The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.

  1. Quantification of the effects of quality investment on the Cost of Poor Quality: A quasi-experimental study

    NASA Astrophysics Data System (ADS)

    Tamimi, Abdallah Ibrahim

    Quality management is a fundamental challenge facing businesses. This research attempted to quantify the effect of quality investment on the Cost of Poor Quality (COPQ) in an aerospace company utilizing 3 years of quality data at United Launch Alliance, a Boeing -- Lockheed Martin Joint Venture Company. Statistical analysis tools, like multiple regressions, were used to quantify the relationship between quality investments and COPQ. Strong correlations were evident by the high correlation coefficient R2 and very small p-values in multiple regression analysis. The models in the study helped produce an Excel macro that based on preset constraints, optimized the level of quality spending to minimize COPQ. The study confirmed that as quality investments were increased, the COPQ decreased steadily until a point of diminishing return was reached. The findings may be used to develop an approach to reduce the COPQ and enhance product performance. Achieving superior quality in rocket launching enhances the accuracy, reliability, and mission success of delivering satellites to their precise orbits in pursuit of knowledge, peace, and freedom while assuring safety for the end user.

  2. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    PubMed Central

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  3. Hydrology and trout populations of cold-water rivers of Michigan and Wisconsin

    USGS Publications Warehouse

    Hendrickson, G.E.; Knutilla, R.L.

    1974-01-01

    Statistical multiple-regression analyses showed significant relationships between trout populations and hydrologic parameters. Parameters showing the higher levels of significance were temperature, hardness of water, percentage of gravel bottom, percentage of bottom vegetation, variability of streamflow, and discharge per unit drainage area. Trout populations increase with lower levels of annual maximum water temperatures, with increase in water hardness, and with increase in percentage of gravel and bottom vegetation. Trout populations also increase with decrease in variability of streamflow, and with increase in discharge per unit drainage area. Most hydrologic parameters were significant when evaluated collectively, but no parameter, by itself, showed a high degree of correlation with trout populations in regression analyses that included all the streams sampled. Regression analyses of stream segments that were restricted to certain limits of hardness, temperature, or percentage of gravel bottom showed improvements in correlation. Analyses of trout populations, in pounds per acre and pounds per mile and hydrologic parameters resulted in regression equations from which trout populations could be estimated with standard errors of 89 and 84 per cent, respectively.

  4. Climatological Modeling of Monthly Air Temperature and Precipitation in Egypt through GIS Techniques

    NASA Astrophysics Data System (ADS)

    El Kenawy, A.

    2009-09-01

    This paper describes a method for modeling and mapping four climatic variables (maximum temperature, minimum temperature, mean temperature and total precipitation) in Egypt using a multiple regression approach implemented in a GIS environment. In this model, a set of variables including latitude, longitude, elevation within a distance of 5, 10 and 15 km, slope, aspect, distance to the Mediterranean Sea, distance to the Red Sea, distance to the Nile, ratio between land and water masses within a radius of 5, 10, 15 km, the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI), the Normalized Difference Temperature Index (NDTI) and reflectance are included as independent variables. These variables were integrated as raster layers in MiraMon software at a spatial resolution of 1 km. Climatic variables were considered as dependent variables and averaged from quality controlled and homogenized 39 series distributing across the entire country during the period of (1957-2006). For each climatic variable, digital and objective maps were finally obtained using the multiple regression coefficients at monthly, seasonal and annual timescale. The accuracy of these maps were assessed through cross-validation between predicted and observed values using a set of statistics including coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), mean bias Error (MBE) and D Willmott statistic. These maps are valuable in the sense of spatial resolution as well as the number of observatories involved in the current analysis.

  5. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  6. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  7. Association between osteoporosis and periodontal disease among postmenopausal Indian women.

    PubMed

    Richa; R, Yashoda; Puranik, Manjunath P; Shrivastava, Amit

    2017-08-01

    The aim of the present study was to determine the association between osteoporosis and periodontal disease among postmenopausal Indian women. A cross-sectional comparative study was conducted among postmenopausal women aged 45-65 years attending various hospitals in Bangalore, India. The examination was performed using the plaque index, gingival index, modified sulcus bleeding index, and community periodontal index. The women then underwent a bone mineral density (BMD) test using an ultrasonometer. Based on the BMD scores, participants were divided into osteoporotic and non-osteoporotic groups. For the statistical analysis, χ 2 -test, Student's t-test, and multiple regression analysis were applied. The mean plaque, gingival, and bleeding scores were significantly higher among osteoporotic women (1.83 ± 0.47, 1.73 ± 0.49, 1.82 ± 0.52) compared to the non-osteoporotic women (1.31 ± 0.40, 1.09 ± 0.52, 1.25 ± 0.50). The mean number of sextants affected for codes 3 and 4 of the community periodontal index and codes 1, 2, and 3 of loss of attachment were significantly higher among osteoporotic group compared to the non-osteoporotic group. Multiple logistic regression tests confirmed the statistically-significant association between osteoporosis and menopause duration, loss of attachment, bleeding, and gingivitis scores. Skeletal BMD is related to clinical attachment loss, bleeding, and gingivitis, which suggests that there is an association between osteoporosis and periodontal diseases. © 2016 John Wiley & Sons Australia, Ltd.

  8. Chair-side detection of Prevotella Intermedia in mature dental plaque by its fluorescence.

    PubMed

    Nomura, Yoshiaki; Takeuchi, Hiroaki; Okamoto, Masaaki; Sogabe, Kaoru; Okada, Ayako; Hanada, Nobuhiro

    2017-06-01

    Prevotella intermedia/nigrescens is one of the well-known pathogens causing periodontal diseases, and the red florescence excited by the visible blue light caused by the protoporphyrin IX in the bacterial cells could be useful for the chair-side detection. The aim of this study was to evaluated levels of periodontal pathogen, especially P. intermedia in clinical samples of red fluorescent dental plaque. Thirty two supra gingival plaque samples from six individuals were measured its fluorescence at 640nm wavelength excited by 409nm. Periodontopathic bacteria were counted by the Invader PLUS PCR assay. Co-relations the fluorescence intensity and bacterial counts were analyzed by Person's correlation coefficient and simple and multiple regression analysis. Positive and negative predictive values of the fluorescence intensities for with or without P. intermedia in supragingival plaque was calculated. When relative fluorescence unit (RFU) were logarithmic transformed, statistically significant linear relations between RFU and bacterial counts were obtained for P. intermedia, Porphyromonas gingivalis and Tannerella forsythia. By the multiple regression analysis, only P. intermedia had statistically significant co-relation with fluorescence intensities. All of the fluorescent dental plaque contained P. intermedia m. In contrast, 28% of non-fluorescent plaques contained P. intermedia. To check the fluorescence dental plaque in the oral cavity could be the simple chair-side screening of the mature dental plaque before examining the periodontal pathogens especially P. intermedia by the PCR method. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Job stress, achievement motivation and occupational burnout among male nurses.

    PubMed

    Hsu, Hsiu-Yueh; Chen, Sheng-Hwang; Yu, Hsing-Yi; Lou, Jiunn-Horng

    2010-07-01

    This paper is a report of an exploration of job stress, achievement motivation and occupational burnout in male nurses and to identify predictors of occupational burnout. Since the Nightingale era, the nursing profession has been recognized as 'women's work'. The data indicate that there are more female nurses than male nurses in Taiwan. However, the turnover rate for male nurses is twice that of female nurses. Understanding the factors that affect occupational burnout of male nurses may help researchers find ways to reduce the likelihood that they will quit. A survey was conducted in Taiwan in 2008 using a cross-sectional design. A total of 121 male nurses participated in the study. Mailed questionnaires were used to collect data, which were analysed using descriptive statistics and stepwise multiple regression. The job stress of male nurses was strongly correlated with occupational burnout (r = 0.64, P < 0.001). Stepwise multiple regression analyses indicated that job stress was the only factor to have a statistically significant direct influence on occupational burnout, accounting for 45.8% of the variance in this. Job stress was comprised of three dimensions, of which role conflict accounted for 40.8% of the variance in occupational burnout. The contribution of job stress to occupational burnout of male nurses was confirmed. As occupational burnout may influence the quality of care by these nurses, nurse managers should strive to decrease male nurses' job stress as this should lead to a reduction of negative outcomes of occupational burnout.

  10. GIS-based spatial statistical analysis of risk areas for liver flukes in Surin Province of Thailand.

    PubMed

    Rujirakul, Ratana; Ueng-arporn, Naporn; Kaewpitoon, Soraya; Loyd, Ryan J; Kaewthani, Sarochinee; Kaewpitoon, Natthawut

    2015-01-01

    It is urgently necessary to be aware of the distribution and risk areas of liver fluke, Opisthorchis viverrini, for proper allocation of prevention and control measures. This study aimed to investigate the human behavior, and environmental factors influencing the distribution in Surin Province of Thailand, and to build a model using stepwise multiple regression analysis with a geographic information system (GIS) on environment and climate data. The relationship between the human behavior, attitudes (<50%; X111), environmental factors like population density (148-169 pop/km2; X73), and land use as wetland (X64), were correlated with the liver fluke disease distribution at 0.000, 0.034, and 0.006 levels, respectively. Multiple regression analysis, by equations OV=-0.599+0.005(population density (148-169 pop/km2); X73)+0.040 (human attitude (<50%); X111)+0.022 (land used (wetland; X64), was used to predict the distribution of liver fluke. OV is the patients of liver fluke infection, R Square=0.878, and, Adjust R Square=0.849. By GIS analysis, we found Si Narong, Sangkha, Phanom Dong Rak, Mueang Surin, Non Narai, Samrong Thap, Chumphon Buri, and Rattanaburi to have the highest distributions in Surin province. In conclusion, the combination of GIS and statistical analysis can help simulate the spatial distribution and risk areas of liver fluke, and thus may be an important tool for future planning of prevention and control measures.

  11. Risk factors for repetitive strain injuries among school teachers in Thailand.

    PubMed

    Chaiklieng, Sunisa; Suggaravetsiri, Pornnapa

    2012-01-01

    Prolonged posture, static works and repetition are previously reported as the cause of repetitive strain injuries (RSIs) among workers including teachers. This cross-sectional analytic study aimed to investigate the prevalence and risk factors of RSIs among school teachers. Participants were 452 full-time school teachers in Thailand. Data were collected by the structural questionnaires, illuminance measurements and the physical fitness tests. Descriptive statistics and inferential statistics which were Chi-square test and multiple logistic regression analysis were used. Most teachers in this study were females (57.3%), the mean years of work experience was 22.6 ± 10.4 years. The six-month prevalence of RSIs was 73.7%. The univariate analysis identified the related risk factors to RSIs which were chronic disease (OR=1.8; 95% CI = 1.16-2.73), history of trauma (OR=2.0; 95% CI = 1.02-4.01), member of family had RSIs (OR=2.0; 95% CI = 1.02- 4.01), stretch to write on board (OR=1.7; 95% CI = 1.06-1.70) and high heel shoe >2 inch (OR=1.6; 95% CI = 1.03-2.51). Multiple logistic regression analysis showed that chronic diseases and high heel shoe >2 inch significantly related to developing of RSIs. The poor grip strength and back muscle flexibility significantly affected RSIs of teachers. In conclusions, RSIs were highly prevalent in school teachers that they should be aware of health promotion to prevent RSIs.

  12. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  13. White matter tract abnormalities are associated with cognitive dysfunction in secondary progressive multiple sclerosis.

    PubMed

    Meijer, Kim A; Muhlert, Nils; Cercignani, Mara; Sethi, Varun; Ron, Maria A; Thompson, Alan J; Miller, David H; Chard, Declan; Geurts, Jeroen Jg; Ciccarelli, Olga

    2016-10-01

    While our knowledge of white matter (WM) pathology underlying cognitive impairment in relapsing remitting multiple sclerosis (MS) is increasing, equivalent understanding in those with secondary progressive (SP) MS lags behind. The aim of this study is to examine whether the extent and severity of WM tract damage differ between cognitively impaired (CI) and cognitively preserved (CP) secondary progressive multiple sclerosis (SPMS) patients. Conventional magnetic resonance imaging (MRI) and diffusion MRI were acquired from 30 SPMS patients and 32 healthy controls (HC). Cognitive domains commonly affected in MS patients were assessed. Linear regression was used to predict cognition. Diffusion measures were compared between groups using tract-based spatial statistics (TBSS). A total of 12 patients were classified as CI, and processing speed was the most commonly affected domain. The final regression model including demographic variables and radial diffusivity explained the greatest variance of cognitive performance (R 2  = 0.48, p = 0.002). SPMS patients showed widespread loss of WM integrity throughout the WM skeleton when compared with HC. When compared with CP patients, CI patients showed more extensive and severe damage of several WM tracts, including the fornix, superior longitudinal fasciculus and forceps major. Loss of WM integrity assessed using TBSS helps to explain cognitive decline in SPMS patients. © The Author(s), 2016.

  14. Pareto fronts for multiobjective optimization design on materials data

    NASA Astrophysics Data System (ADS)

    Gopakumar, Abhijith; Balachandran, Prasanna; Gubernatis, James E.; Lookman, Turab

    Optimizing multiple properties simultaneously is vital in materials design. Here we apply infor- mation driven, statistical optimization strategies blended with machine learning methods, to address multi-objective optimization tasks on materials data. These strategies aim to find the Pareto front consisting of non-dominated data points from a set of candidate compounds with known character- istics. The objective is to find the pareto front in as few additional measurements or calculations as possible. We show how exploration of the data space to find the front is achieved by using uncer- tainties in predictions from regression models. We test our proposed design strategies on multiple, independent data sets including those from computations as well as experiments. These include data sets for Max phases, piezoelectrics and multicomponent alloys.

  15. Two Paradoxes in Linear Regression Analysis.

    PubMed

    Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

    2016-12-25

    Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

  16. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  17. Selected Streamflow Statistics and Regression Equations for Predicting Statistics at Stream Locations in Monroe County, Pennsylvania

    USGS Publications Warehouse

    Thompson, Ronald E.; Hoffman, Scott A.

    2006-01-01

    A suite of 28 streamflow statistics, ranging from extreme low to high flows, was computed for 17 continuous-record streamflow-gaging stations and predicted for 20 partial-record stations in Monroe County and contiguous counties in north-eastern Pennsylvania. The predicted statistics for the partial-record stations were based on regression analyses relating inter-mittent flow measurements made at the partial-record stations indexed to concurrent daily mean flows at continuous-record stations during base-flow conditions. The same statistics also were predicted for 134 ungaged stream locations in Monroe County on the basis of regression analyses relating the statistics to GIS-determined basin characteristics for the continuous-record station drainage areas. The prediction methodology for developing the regression equations used to estimate statistics was developed for estimating low-flow frequencies. This study and a companion study found that the methodology also has application potential for predicting intermediate- and high-flow statistics. The statistics included mean monthly flows, mean annual flow, 7-day low flows for three recurrence intervals, nine flow durations, mean annual base flow, and annual mean base flows for two recurrence intervals. Low standard errors of prediction and high coefficients of determination (R2) indicated good results in using the regression equations to predict the statistics. Regression equations for the larger flow statistics tended to have lower standard errors of prediction and higher coefficients of determination (R2) than equations for the smaller flow statistics. The report discusses the methodologies used in determining the statistics and the limitations of the statistics and the equations used to predict the statistics. Caution is indicated in using the predicted statistics for small drainage area situations. Study results constitute input needed by water-resource managers in Monroe County for planning purposes and evaluation of water-resources availability.

  18. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less

  19. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  20. Two Paradoxes in Linear Regression Analysis

    PubMed Central

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  1. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  2. SU-F-T-386: Analysis of Three QA Methods for Predicting Dose Deviation Pass Percentage for Lung SBRT VMAT Plans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hardin, M; To, D; Giaddui, T

    2016-06-15

    Purpose: To investigate the significance of using pinpoint ionization chambers (IC) and RadCalc (RC) in determining the quality of lung SBRT VMAT plans with low dose deviation pass percentage (DDPP) as reported by ScandiDos Delta4 (D4). To quantify the relationship between DDPP and point dose deviations determined by IC (ICDD), RadCalc (RCDD), and median dose deviation reported by D4 (D4DD). Methods: Point dose deviations and D4 DDPP were compiled for 45 SBRT VMAT plans. Eighteen patients were treated on Varian Truebeam linear accelerators (linacs); the remaining 27 were treated on Elekta Synergy linacs with Agility collimators. A one-way analysis ofmore » variance (ANOVA) was performed to determine if there were any statistically significant differences between D4DD, ICDD, and RCDD. Tukey’s test was used to determine which pair of means was statistically different from each other. Multiple regression analysis was performed to determine if D4DD, ICDD, or RCDD are statistically significant predictors of DDPP. Results: Median DDPP, D4DD, ICDD, and RCDD were 80.5% (47.6%–99.2%), −0.3% (−2.0%–1.6%), 0.2% (−7.5%–6.3%), and 2.9% (−4.0%–19.7%), respectively. The ANOVA showed a statistically significant difference between D4DD, ICDD, and RCDD for a 95% confidence interval (p < 0.001). Tukey’s test revealed a statistically significant difference between two pairs of groups, RCDD-D4DD and RCDD-ICDD (p < 0.001), but no difference between ICDD-D4DD (p = 0.485). Multiple regression analysis revealed that ICDD (p = 0.04) and D4DD (p = 0.03) are statistically significant predictors of DDPP with an adjusted r{sup 2} of 0.115. Conclusion: This study shows ICDD predicts trends in D4 DDPP; however this trend is highly variable as shown by our low r{sup 2}. This work suggests that ICDD can be used as a method to verify DDPP in delivery of lung SBRT VMAT plans. RCDD may not validate low DDPP discovered in D4 QA for small field SBRT treatments.« less

  3. Multiple Correlation versus Multiple Regression.

    ERIC Educational Resources Information Center

    Huberty, Carl J.

    2003-01-01

    Describes differences between multiple correlation analysis (MCA) and multiple regression analysis (MRA), showing how these approaches involve different research questions and study designs, different inferential approaches, different analysis strategies, and different reported information. (SLD)

  4. Forecasting daily patient volumes in the emergency department.

    PubMed

    Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L

    2008-02-01

    Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.

  5. Importance of Preserving Cross-correlation in developing Statistically Downscaled Climate Forcings and in estimating Land-surface Fluxes and States

    NASA Astrophysics Data System (ADS)

    Das Bhowmik, R.; Arumugam, S.

    2015-12-01

    Multivariate downscaling techniques exhibited superiority over univariate regression schemes in terms of preserving cross-correlations between multiple variables- precipitation and temperature - from GCMs. This study focuses on two aspects: (a) develop an analytical solutions on estimating biases in cross-correlations from univariate downscaling approaches and (b) quantify the uncertainty in land-surface states and fluxes due to biases in cross-correlations in downscaled climate forcings. Both these aspects are evaluated using climate forcings available from both historical climate simulations and CMIP5 hindcasts over the entire US. The analytical solution basically relates the univariate regression parameters, co-efficient of determination of regression and the co-variance ratio between GCM and downscaled values. The analytical solutions are compared with the downscaled univariate forcings by choosing the desired p-value (Type-1 error) in preserving the observed cross-correlation. . For quantifying the impacts of biases on cross-correlation on estimating streamflow and groundwater, we corrupt the downscaled climate forcings with different cross-correlation structure.

  6. [The Influence of Subjective Health Status, Post-Traumatic Growth, and Social Support on Successful Aging in Middle-Aged Women].

    PubMed

    Lee, Seung Hee; Jang, Hyung Suk; Yang, Young Hee

    2016-10-01

    This study was done to investigate factors influencing successful aging in middle-aged women. A convenience sample of 103 middle-aged women was selected from the community. Data were collected using a structured questionnaire and analyzed using descriptive statistics, two-sample t-test, one-way ANOVA, Kruskal Wallis test, Pearson correlations, Spearman correlations and multiple regression analysis with the SPSS/WIN 22.0 program. Results of regression analysis showed that significant factors influencing successful aging were post-traumatic growth and social support. This regression model explained 48% of the variance in successful aging. Findings show that the concept 'post-traumatic growth' is an important factor influencing successful aging in middle-aged women. In addition, social support from friends/co-workers had greater influence on successful aging than social support from family. Thus, we need to consider the positive impact of post-traumatic growth and increase the chances of social participation in a successful aging program for middle-aged women.

  7. Linkage mapping of beta 2 EEG waves via non-parametric regression.

    PubMed

    Ghosh, Saurabh; Begleiter, Henri; Porjesz, Bernice; Chorlian, David B; Edenberg, Howard J; Foroud, Tatiana; Goate, Alison; Reich, Theodore

    2003-04-01

    Parametric linkage methods for analyzing quantitative trait loci are sensitive to violations in trait distributional assumptions. Non-parametric methods are relatively more robust. In this article, we modify the non-parametric regression procedure proposed by Ghosh and Majumder [2000: Am J Hum Genet 66:1046-1061] to map Beta 2 EEG waves using genome-wide data generated in the COGA project. Significant linkage findings are obtained on chromosomes 1, 4, 5, and 15 with findings at multiple regions on chromosomes 4 and 15. We analyze the data both with and without incorporating alcoholism as a covariate. We also test for epistatic interactions between regions of the genome exhibiting significant linkage with the EEG phenotypes and find evidence of epistatic interactions between a region each on chromosome 1 and chromosome 4 with one region on chromosome 15. While regressing out the effect of alcoholism does not affect the linkage findings, the epistatic interactions become statistically insignificant. Copyright 2003 Wiley-Liss, Inc.

  8. The Detection and Interpretation of Interaction Effects between Continuous Variables in Multiple Regression.

    ERIC Educational Resources Information Center

    Jaccard, James; And Others

    1990-01-01

    Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)

  9. Modeling of tropospheric NO2 column over different climatic zones and land use/land cover types in South Asia

    NASA Astrophysics Data System (ADS)

    ul-Haq, Zia; Rana, Asim Daud; Tariq, Salman; Mahmood, Khalid; Ali, Muhammad; Bashir, Iqra

    2018-03-01

    We have applied regression analyses for the modeling of tropospheric NO2 (tropo-NO2) as the function of anthropogenic nitrogen oxides (NOx) emissions, aerosol optical depth (AOD), and some important meteorological parameters such as temperature (Temp), precipitation (Preci), relative humidity (RH), wind speed (WS), cloud fraction (CLF) and outgoing long-wave radiation (OLR) over different climatic zones and land use/land cover types in South Asia during October 2004-December 2015. Simple linear regression shows that, over South Asia, tropo-NO2 variability is significantly linked to AOD, WS, NOx, Preci and CLF. Also zone-5, consisting of tropical monsoon areas of eastern India and Myanmar, is the only study zone over which all the selected parameters show their influence on tropo-NO2 at statistical significance levels. In stepwise multiple linear modeling, tropo-NO2 column over landmass of South Asia, is significantly predicted by the combination of RH (standardized regression coefficient, β = - 49), AOD (β = 0.42) and NOx (β = 0.25). The leading predictors of tropo-NO2 columns over zones 1-5 are OLR, AOD, Temp, OLR, and RH respectively. Overall, as revealed by the higher correlation coefficients (r), the multiple regressions provide reasonable models for tropo-NO2 over South Asia (r = 0.82), zone-4 (r = 0.90) and zone-5 (r = 0.93). The lowest r (of 0.66) has been found for hot semi-arid region in northwestern Indus-Ganges Basin (zone-2). The highest value of β for urban area AOD (of 0.42) is observed for megacity Lahore, located in warm semi-arid zone-2 with large scale crop-residue burning, indicating strong influence of aerosols on the modeled tropo-NO2 column. A statistical significant correlation (r = 0.22) at the 0.05 level is found between tropo-NO2 and AOD over Lahore. Also NOx emissions appear as the highest contributor (β = 0.59) for modeled tropo-NO2 column over megacity Dhaka.

  10. Bone mineral density and correlation factor analysis in normal Taiwanese children.

    PubMed

    Shu, San-Ging

    2007-01-01

    Our aim was to establish reference data and linear regression equations for lumbar bone mineral density (BMD) in normal Taiwanese children. Several influencing factors of lumbar BMD were investigated. Two hundred fifty-seven healthy children were recruited from schools, 136 boys and 121 girls, aged 4-18 years were enrolled on a voluntary basis with written consent. Their height, weight, blood pressure, puberty stage, bone age and lumbar BMD (L2-4) by dual energy x-ray absorptiometry (DEXA) were measured. Data were analyzed using Pearson correlation and stepwise regression tests. All measurements increased with age. Prior to age 8, there was no gender difference. Parameters such as height, weight, and bone age (BA) in girls surpassed boys between ages 8-13 without statistical significance (p> or =0.05). This was reversed subsequently after age 14 in height (p<0.05). BMD difference had the same trend but was not statistically significant either. The influencing power of puberty stage and bone age over BMD was almost equal to or higher than that of height and weight. All the other factors correlated with BMD to variable powers. Multiple linear regression equations for boys and girls were formulated. BMD reference data is provided and can be used to monitor childhood pathological conditions. However, BMD in those with abnormal bone age or pubertal development could need modifications to ensure accuracy.

  11. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  12. Gender interactions and success.

    PubMed

    Wiggins, Carla; Peterson, Teri

    2004-01-01

    Does gender by itself, or does gender's interaction with career variables, better explain the difference between women and men's careers in healthcare management? US healthcare managers were surveyed regarding career and personal experiences. Gender was statistically interacted with explanatory variables. Multiple regression with backwards selection systematically removed non-significant variables. All gender interaction variables were non-significant. Much of the literature proposes that work and career factors impact working women differently than working men. We find that while gender alone is a significant predictor of income, it does not significantly interact with other career variables.

  13. An economic approach to abortion demand.

    PubMed

    Rothstein, D S

    1992-01-01

    "This paper uses econometric multiple regression techniques in order to analyze the socioeconomic factors affecting the demand for abortion for the year 1985. A cross-section of the 50 [U.S.] states and Washington D.C. is examined and a household choice theoretical framework is utilized. The results suggest that average price of abortion, disposable personal per capita income, percentage of single women, whether abortions are state funded, unemployment rate, divorce rate, and if the state is located in the far West, are statistically significant factors in the determination of the demand for abortion." excerpt

  14. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  15. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

    PubMed

    Yang, Xiaowei; Nie, Kun

    2008-03-15

    Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.

  16. Post-processing method for wind speed ensemble forecast using wind speed and direction

    NASA Astrophysics Data System (ADS)

    Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

    2017-04-01

    Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.

  17. Correlates of motivation to change in pathological gamblers completing cognitive-behavioral group therapy.

    PubMed

    Gómez-Peña, Mónica; Penelo, Eva; Granero, Roser; Fernández-Aranda, Fernando; Alvarez-Moya, Eva; Santamaría, Juan José; Moragas, Laura; Neus Aymamí, Maria; Gunnard, Katarina; Menchón, José M; Jimenez-Murcia, Susana

    2012-07-01

    The present study analyzes the association between the motivation to change and the cognitive-behavioral group intervention, in terms of dropouts and relapses, in a sample of male pathological gamblers. The specific objectives were as follows: (a) to estimate the predictive value of baseline University of Rhode Island Change Assessment scale (URICA) scores (i.e., at the start of the study) as regards the risk of relapse and dropout during treatment and (b) to assess the incremental predictive ability of URICA scores, as regards the mean change produced in the clinical status of patients between the start and finish of treatment. The relationship between the URICA and the response to treatment was analyzed by means of a pre-post design applied to a sample of 191 patients who were consecutively receiving cognitive-behavioral group therapy. The statistical analysis included logistic regression models and hierarchical multiple linear regression models. The discriminative ability of the models including the four URICA scores regarding the likelihood of relapse and dropout was acceptable (area under the receiver operating haracteristic curve: .73 and .71, respectively). No significant predictive ability was found as regards the differences between baseline and posttreatment scores (changes in R(2) below 5% in the multiple regression models). The availability of useful measures of motivation to change would enable treatment outcomes to be optimized through the application of specific therapeutic interventions. © 2012 Wiley Periodicals, Inc.

  18. Occupational exposure to potentially infectious biological material in a dental teaching environment.

    PubMed

    Machado-Carvalhais, Helenaura P; Ramos-Jorge, Maria L; Auad, Sheyla M; Martins, Laura H P M; Paiva, Saul M; Pordeus, Isabela A

    2008-10-01

    The aims of this cross-sectional study were to determine the prevalence of occupational accidents with exposure to biological material among undergraduate students of dentistry and to estimate potential risk factors associated with exposure to blood. Data were collected through a self-administered questionnaire (86.4 percent return rate), which was completed by a sample of 286 undergraduate dental students (mean age 22.4 +/-2.4 years). The students were enrolled in the clinical component of the curriculum, which corresponds to the final six semesters of study. Descriptive, bivariate, simple logistic regression and multiple logistic regression (Forward Stepwise Procedure) analyses were performed. The level of statistical significance was set at 5 percent. Percutaneous and mucous exposures to potentially infectious biological material were reported by 102 individuals (35.6 percent); 26.8 percent reported the occurrence of multiple episodes of exposure. The logistic regression analyses revealed that the incomplete use of individual protection equipment (OR=3.7; 95 percent CI 1.5-9.3), disciplines where surgical procedures are carried out (OR=16.3; 95 percent CI 7.1-37.2), and handling sharp instruments (OR=4.4; 95 percent CI 2.1-9.1), more specifically, hollow-bore needles (OR=6.8; 95 percent CI 2.1-19.0), were independently associated with exposure to blood. Policies of reviewing the procedures during clinical practice are recommended in order to reduce occupational exposure.

  19. Investigation of marital satisfaction and its relationship with job stress and general health of nurses in Qazvin, Iran

    PubMed Central

    Azimian, Jalil; Piran, Pegah; Jahanihashemi, Hassan; Dehghankar, Leila

    2017-01-01

    Background Pressures in nursing can affect family life and marital problems, disrupt common social problems, increase work-family conflicts and endanger people’s general health. Aim To determine marital satisfaction and its relationship with job stress and general health of nurses. Methods This descriptive and cross-sectional study was done in 2015 in medical educational centers of Qazvin by using an ENRICH marital satisfaction scale and General Health and Job Stress questionnaires completed by 123 nurses. Analysis was done by SPSS version 19 using descriptive and analytical statistics (Pearson correlation, t-test, ANOVA, Chi-square, regression line, multiple regression analysis). Results The findings showed that 64.4% of nurses had marital satisfaction. There was significant relationship between age (p=0.03), job experience (p=0.01), age of spouse (p=0.01) and marital satisfaction. The results showed that there was a significant relationship between marital satisfaction and general health (p<0.0001). Multiple regression analysis showed that there was a significant relationship between depression (p=0.012) and anxiety (p=0.001) with marital satisfaction. Conclusions Due to high levels of job stress and disorder in general health of nurses and low marital satisfaction by running health promotion programs and paying attention to its dimensions can help work and family health of nurses. PMID:28607660

  20. Comparing the index-flood and multiple-regression methods using L-moments

    NASA Astrophysics Data System (ADS)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin in central Iran. To estimate floods of various return periods for gauged catchments in the study area, the mean annual peak flood of the catchments may be multiplied by corresponding values of the growth factors, and computed using the GEV distribution.

  1. Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies.

    PubMed

    Koerner, Tess K; Zhang, Yang

    2017-02-27

    Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers.

  2. Quantile regression models of animal habitat relationships

    USGS Publications Warehouse

    Cade, Brian S.

    2003-01-01

    Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.

  3. Prevalence and risk factors of non-carious cervical lesions related to occupational exposure to acid mists.

    PubMed

    Bomfim, Rafael Aiello; Crosato, Edgard; Mazzilli, Luiz Eugênio Nigro; Frias, Antonio Carlos

    2015-01-01

    This study evaluates the prevalence and risk factors of non-carious cervical lesions (NCCLs) in a Brazilian population of workers exposed and non-exposed to acid mists and chemical products. One hundred workers (46 exposed and 54 non-exposed) were evaluated in a Centro de Referência em Saúde do Trabalhador - CEREST (Worker's Health Reference Center). The workers responded to questionnaires regarding their personal information and about alcohol consumption and tobacco use. A clinical examination was conducted to evaluate the presence of NCCLs, according to WHO parameters. Statistical analyses were performed by unconditional logistic regression and multiple linear regression, with the critical level of p < 0.05. NCCLs were significantly associated with age groups (18-34, 35-44, 45-68 years). The unconditional logistic regression showed that the presence of NCCLs was better explained by age group (OR = 4.04; CI 95% 1.77-9.22) and occupational exposure to acid mists and chemical products (OR = 3.84; CI 95% 1.10-13.49), whereas the linear multiple regression revealed that NCCLs were better explained by years of smoking (p = 0.01) and age group (p = 0.04). The prevalence of NCCLs in the study population was particularly high (76.84%), and the risk factors for NCCLs were age, exposure to acid mists and smoking habit. Controlling risk factors through preventive and educative measures, allied to the use of personal protective equipment to prevent the occupational exposure to acid mists, may contribute to minimizing the prevalence of NCCLs.

  4. Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test

    PubMed Central

    Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.

    2015-01-01

    High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468

  5. Apical root resorption in orthodontically treated adults.

    PubMed

    Baumrind, S; Korn, E L; Boyd, R L

    1996-09-01

    This study analyzed the relationship in orthodontically treated adults between upper central incisor displacement measured on lateral cephalograms and apical root resorption measured on anterior periapical x-ray films. A multiple linear regression examined incisor displacements in four directions (retraction, advancement, intrusion, and extrusion) as independent variables, attempting to account for observed differences in the dependent variable, resorption. Mean apical resorption was 1.36 mm (sd +/- 1.46, n = 73). Mean horizontal displacement of the apex was -0.83 mm (sd +/- 1.74, n = 67); mean vertical displacement was 0.19 mm (sd +/- 1.48, n = 67). The regression coefficients for the intercept and for retraction were highly significant; those for extrusion, intrusion, and advancement were not. At the 95% confidence level, an average of 0.99 mm (se = +/- 0.34) of resorption was implied in the absence of root displacement and an average of 0.49 mm (se = +/- 0.14) of resorption was implied per millimeter of retraction. R2 for all four directional displacement variables (DDVs) taken together was only 0.20, which implied that only a relatively small portion of the observed apical resorption could be accounted for by tooth displacement alone. In a secondary set of univariate analyses, the associations between apical resorption and each of 14 additional treatment-related variables were examined. Only Gender, Elapsed Time, and Total Apical Displacement displayed statistically significant associations with apical resorption. Additional multiple regressions were then performed in which the data for each of these three statistically significant variables were considered separately, with the data for the four directional displacement variables. The addition of information on Elapsed Time or Total Apical Displacement did not explain a significant additional portion of the variability in apical resorption. On the other hand, the addition of information on Gender to the information on the four directional displacement variables yielded an R2 value of 0.35, which indicated that these variables taken together could account for approximately a third of the observed variability in apical resorption in this sample.

  6. Economic Expansion Is a Major Determinant of Physician Supply and Utilization

    PubMed Central

    Cooper, Richard A; Getzen, Thomas E; Laud, Prakash

    2003-01-01

    Objective To assess the relationship between levels of economic development and the supply and utilization of physicians. Data Sources Data were obtained from the American Medical Association, American Osteopathic Association, Organization for Economic Cooperation and Development (OECD), Bureau of Health Professions, Bureau of Labor Statistics, Bureau of Economic Analysis, Census Bureau, Health Care Financing Administration, and historical sources. Study Design Economic development, expressed as real per capita gross domestic product (GDP) or personal income, was correlated with per capita health care labor and physician supply within countries and states over periods of time spanning 25–70 years and across countries, states, and metropolitan statistical areas (MSAs) at multiple points in time over periods of up to 30 years. Longitudinal data were analyzed in four complementary ways: (1) simple univariate regressions; (2) regressions in which temporal trends were partialled out; (3) time series comparing percentage differences across segments of time; and (4) a bivariate Granger causality test. Cross-sectional data were assessed at multiple time points by means of univariate regression analyses. Principal Findings Under each analytic scenario, physician supply correlated with differences in GDP or personal income. Longitudinal correlations were associated with temporal lags of approximately 5 years for health employment and 10 years for changes in physician supply. The magnitude of changes in per capita physician supply in the United States was equivalent to differences of approximately 0.75 percent for each 1.0 percent difference in GDP. The greatest effects of economic expansion were on the medical specialties, whereas the surgical and hospital-based specialties were affected to a lesser degree, and levels of economic expansion had little influence on family/general practice. Conclusions Economic expansion has a strong, lagged relationship with changes in physician supply. This suggests that economic projections could serve as a gauge for projecting the future utilization of physician services. PMID:12785567

  7. Wastewater-Based Epidemiology of Stimulant Drugs: Functional Data Analysis Compared to Traditional Statistical Methods.

    PubMed

    Salvatore, Stefania; Bramness, Jørgen Gustav; Reid, Malcolm J; Thomas, Kevin Victor; Harman, Christopher; Røislien, Jo

    2015-01-01

    Wastewater-based epidemiology (WBE) is a new methodology for estimating the drug load in a population. Simple summary statistics and specification tests have typically been used to analyze WBE data, comparing differences between weekday and weekend loads. Such standard statistical methods may, however, overlook important nuanced information in the data. In this study, we apply functional data analysis (FDA) to WBE data and compare the results to those obtained from more traditional summary measures. We analysed temporal WBE data from 42 European cities, using sewage samples collected daily for one week in March 2013. For each city, the main temporal features of two selected drugs were extracted using functional principal component (FPC) analysis, along with simpler measures such as the area under the curve (AUC). The individual cities' scores on each of the temporal FPCs were then used as outcome variables in multiple linear regression analysis with various city and country characteristics as predictors. The results were compared to those of functional analysis of variance (FANOVA). The three first FPCs explained more than 99% of the temporal variation. The first component (FPC1) represented the level of the drug load, while the second and third temporal components represented the level and the timing of a weekend peak. AUC was highly correlated with FPC1, but other temporal characteristic were not captured by the simple summary measures. FANOVA was less flexible than the FPCA-based regression, and even showed concordance results. Geographical location was the main predictor for the general level of the drug load. FDA of WBE data extracts more detailed information about drug load patterns during the week which are not identified by more traditional statistical methods. Results also suggest that regression based on FPC results is a valuable addition to FANOVA for estimating associations between temporal patterns and covariate information.

  8. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    PubMed

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value

  9. Lithium and neuroleptics in combination: is there enhancement of neurotoxicity leading to permanent sequelae?

    PubMed

    Goldman, S A

    1996-10-01

    Neurotoxicity in relation to concomitant administration of lithium and neuroleptic drugs, particularly haloperidol, has been an ongoing issue. This study examined whether use of lithium with neuroleptic drugs enhances neurotoxicity leading to permanent sequelae. The Spontaneous Reporting System database of the United States Food and Drug Administration and extant literature were reviewed for spectrum cases of lithium/neuroleptic neurotoxicity. Groups taking lithium alone (Li), lithium/haloperidol (LiHal) and lithium/ nonhaloperidol neuroleptics (LiNeuro), each paired for recovery and sequelae, were established for 237 cases. Statistical analyses included pairwise comparisons of lithium levels using the Wilcoxon Rank Sum procedure and logistic regression to analyze the relationship between independent variables and development of sequelae. The Li and Li-Neuro groups showed significant statistical differences in median lithium levels between recovery and sequelae pairs, whereas the LiHal pair did not differ significantly. Lithium level was associated with sequelae development overall and within the Li and LiNeuro groups; no such association was evident in the LiHal group. On multivariable logistic regression analysis, lithium level and taking lithium/haloperidol were significant factors in the development of sequelae, with multiple possibly confounding factors (e.g., age, sex) not statistically significant. Multivariable logistic regression analyses with neuroleptic dose as five discrete dose ranges or actual dose did not show an association between development of sequelae and dose. Database limitations notwithstanding, the lack of apparent impact of serum lithium level on the development of sequelae in patients treated with haloperidol contrasts notably with results in the Li and LiNeuro groups. These findings may suggest a possible effect of pharmacodynamic factors in lithium/neuroleptic combination therapy.

  10. Estimating labile particulate iron concentrations in coastal waters from remote sensing data

    NASA Astrophysics Data System (ADS)

    McGaraghan, Anna R.; Kudela, Raphael M.

    2012-02-01

    Owing to the difficulties inherent in measuring trace metals and the importance of iron as a limiting nutrient for biological systems, the ability to monitor particulate iron concentration remotely is desirable. This study examines the relationship between labile particulate iron, described here as weak acid leachable particulate iron or total dissolvable iron, and easily obtained bio-optical measurements. We develop a bio-optical proxy that can be used to estimate large-scale patterns of labile iron concentrations in surface waters, and we extend this by including other environmental variables in a multiple linear regression statistical model. By utilizing a ratio of optical backscatter and fluorescence obtained by satellite, we identify patterns in iron concentrations confirmed by traditional shipboard sampling. This basic relationship is improved with the addition of other environmental parameters in the statistical linear regression model. The optical proxy detects known temporal and spatial trends in average surface iron concentrations in Monterey Bay. The proxy is robust in that similar performance was obtained using two independent particulate iron data sets, but it exhibits weaker correlations than the full statistical model. This proxy will be a valuable tool for oceanographers seeking to monitor iron concentrations in coastal regions and allows for better understanding of the variability of labile particulate iron in surface waters to complement direct measurement of leachable particulate or total dissolvable iron.

  11. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.

  12. Quantification and regionalization of groundwater recharge in South-Central Kansas: Integrating field characterization, statistical analysis, and GIS

    USGS Publications Warehouse

    Sophocleous, M.

    2000-01-01

    A practical methodology for recharge characterization was developed based on several years of field-oriented research at 10 sites in the Great Bend Prairie of south-central Kansas. This methodology combines the soil-water budget on a storm-by-storm year-round basis with the resulting watertable rises. The estimated 1985-1992 average annual recharge was less than 50mm/year with a range from 15 mm/year (during the 1998 drought) to 178 mm/year (during the 1993 flood year). Most of this recharge occurs during the spring months. To regionalize these site-specific estimates, an additional methodology based on multiple (forward) regression analysis combined with classification and GIS overlay analyses was developed and implemented. The multiple regression analysis showed that the most influential variables were, in order of decreasing importance, total annual precipitation, average maximum springtime soil-profile water storage, average shallowest springtime depth to watertable, and average springtime precipitation rate. Therefore, four GIS (ARC/INFO) data "layers" or coverages were constructed for the study region based on these four variables, and each such coverage was classified into the same number of data classes to avoid biasing the results. The normalized regression coefficients were employed to weigh the class rankings of each recharge-affecting variable. This approach resulted in recharge zonations that agreed well with the site recharge estimates. During the "Great Flood of 1993," when rainfall totals exceeded normal levels by -200% in the northern portion of the study region, the developed regionalization methodology was tested against such extreme conditions, and proved to be both practical, based on readily available or easily measurable data, and robust. It was concluded that the combination of multiple regression and GIS overlay analyses is a powerful and practical approach to regionalizing small samples of recharge estimates.

  13. Factors influencing health information system adoption in American hospitals.

    PubMed

    Wang, Bill B; Wan, Thomas T H; Burke, Darrell E; Bazzoli, Gloria J; Lin, Blossom Y J

    2005-01-01

    To study the number of health information systems (HISs), applicable to administrative, clinical, and executive decision support functionalities, adopted by acute care hospitals and to examine how hospital market, organizational, and financial factors influence HIS adoption. A cross-sectional analysis was performed with 1441 hospitals selected from metropolitan statistical areas in the United States. Multiple data sources were merged. Six hypotheses were empirically tested by multiple regression analysis. HIS adoption was influenced by the hospital market, organizational, and financial factors. Larger, system-affiliated, and for-profit hospitals with more preferred provider organization contracts are more likely to adopt managerial information systems than their counterparts. Operating revenue is positively associated with HIS adoption. The study concludes that hospital organizational and financial factors influence on hospitals' strategic adoption of clinical, administrative, and managerial information systems.

  14. Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

    PubMed

    Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

    2014-12-01

    This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.

  15. Quantile regression for the statistical analysis of immunological data with many non-detects.

    PubMed

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  16. Adverse Effects of Prolonged Sitting Behavior on the General Health of Office Workers.

    PubMed

    Daneshmandi, Hadi; Choobineh, Alireza; Ghaem, Haleh; Karimi, Mehran

    2017-07-01

    Excessive sitting behavior is a risk factor for many adverse health outcomes. This study aimed to survey the prevalence of sitting behavior and its adverse effects among Iranian office workers. This cross-sectional study included 447 Iranian office workers. A two-part questionnaire was used as the data collection tool. The first part surveyed the demographic characteristics and general health of the respondents, while the second part contained the Nordic Musculoskeletal Questionnaire (NMQ) to assess symptoms. Statistical analyses were performed using the Statistical Package for the Social Sciences software using Mann-Whitney U and Chi-square tests and multiple logistic regression analysis. The respondents spent an average of 6.29 hours of an 8-hour working shift in a sitting position. The results showed that 48.8% of the participants did not feel comfortable with their workstations and 73.6% felt exhausted during the workday. Additionally, 6.3% suffered from hypertension, and 11.2% of them reported hyperlipidemia. The results of the NMQ showed that neck (53.5%), lower back (53.2%) and shoulder (51.6%) symptoms were the most prevalent problem among office workers. Based upon a multiple logistic regression, only sex had a significant association with prolonged sitting behavior (odds ratio = 3.084). Our results indicated that long sitting times were associated with exhaustion during the working day, decreased job satisfaction, hypertension, and musculoskeletal disorder symptoms in the shoulders, lower back, thighs, and knees of office workers. Sitting behavior had adverse effects on office workers. Active workstations are therefore recommended to improve working conditions.

  17. Radiographic assessment of third molars development and it's relation to dental and chronological age in an Iranian population.

    PubMed

    Monirifard, Mohamad; Yaraghi, Navid; Vali, Ava; Vali, Asana; Vali, Amrita

    2015-01-01

    The aim of the present study was to estimate chronological age based on third molar development and to determine the association between dental age and third molar calcification stages. In this cross-sectional study, 505 digital panoramic radiographs of 223 males (44.2%) and 282 females (55.8%) between the age of 6 and 17 were selected from patients who were treated in Departments of Pediatrics and Orthodontics of Isfahan University of Medical Sciences between the years of 2009 and 2013. Correlation between chronological age and third molar development was analyzed with SPSS 21 using Spearman's Rank correlation coefficient, Chi-square test and multiple regression statistical tests (P < 0.05). All third molars demonstrated a highly significant correlation with dental age (P < 0.001). The teeth showing the highest relationship with dental age were mandibular left third molar in males and mandibular right third molar in females (r s = 0.072). When multiple regression was used to predict dental age based on molar calcification stage, the only significant correlation was between maxillary left third molar in males (P < 0.05). There was no statistically significant correlation for any of third molars in females. Relationship between chronological age and molars development stage was significant in all age subgroups and in both gender (P < 0.001). Strong correlation was observed between left third molars and dental age in males. Results showed that third molar calcification stage can be used as an age predictor and in general mandibular teeth seems to be more reliable for this purpose in both genders and in all ages.

  18. Dual diagnosis vs. triple diagnosis in HIV: a comparative study to evaluate the differences in psychopathology and suicidal risk in HIV positive male subjects.

    PubMed

    Gupta, M; Kumar, K; Garg, P D

    2013-12-01

    The problem of triple diagnosis of HIV, substance abuse and psychiatric disorders is a complex one with difficult solutions. HIV disease progression is affected by substance use as well as psychiatric illness burden due to both direct as well as indirect factors. Continuing substance abuse with poor drug adherence coexists with psychiatric disorders leading to increased morbidity and mortality. A total of 100 HIV positive subjects comprising of two groups each having 50 subjects with and without substance abuse were assessed using detailed history, mental state examination, WHO schedule for clinical assessment in neuropsychiatry (SCAN 2.0) and Beck's Scale for Suicidal Ideation (BSS). Statistical analysis used Chi-Square test, Fischer's exact test, Student's t-test, Pearson's correlation coefficient, univariate and multiple regression analysis, univariate and multiple logistic regression analysis. p-Value<0.05 was considered to denote statistical significance. Subjects with substance use disorder had higher rates of psychiatric morbidity (52% vs. 24%, 95% CI=0.5200, p<0.05). The rate of antiretroviral therapy default was almost double in subjects with substance abuse, as compared to subjects without substance use. Suicidal risk was significantly increased (p<0.05) in subjects with co-morbid medical disorders but substance abuse did not increase the risk. Substance abuse inflicts a much greater burden on HIV positive individuals as compared to subjects without substance use. Concomitant substance abuse resulted in significantly increased duration of illness and psychiatric morbidity. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. False Positives in Multiple Regression: Unanticipated Consequences of Measurement Error in the Predictor Variables

    ERIC Educational Resources Information Center

    Shear, Benjamin R.; Zumbo, Bruno D.

    2013-01-01

    Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…

  20. Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan T.

    2012-01-01

    Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…

  1. Use of Multiple Regression and Use-Availability Analyses in Determining Habitat Selection by Gray Squirrels (Sciurus Carolinensis)

    Treesearch

    John W. Edwards; Susan C. Loeb; David C. Guynn

    1994-01-01

    Multiple regression and use-availability analyses are two methods for examining habitat selection. Use-availability analysis is commonly used to evaluate macrohabitat selection whereas multiple regression analysis can be used to determine microhabitat selection. We compared these techniques using behavioral observations (n = 5534) and telemetry locations (n = 2089) of...

  2. Demirjian's method in the estimation of age: A study on human third molars.

    PubMed

    Lewis, Amitha J; Boaz, Karen; Nagesh, K R; Srikant, N; Gupta, Neha; Nandita, K P; Manaktala, Nidhi

    2015-01-01

    The primary aim of the following study is to estimate the chronological age based on the stages of third molar development following the eight stages (A to H) method of Demirjian et al. (along with two modifications-Orhan) and secondary aim is to compare third molar development with sex and age. The sample consisted of 115 orthopantomograms from South Indian subjects with known chronological age and gender. Multiple regression analysis was performed with chronological age as the dependable variable and third molar root development as independent variable. All the statistical analysis was performed using the SPSS 11.0 package (IBM ® Corporation). Statistically no significant differences were found in third molar development between males and females. Depending on the available number of wisdom teeth in an individual, R (2) varied for males from 0.21 to 0.48 and for females from 0.16 to 0.38. New equations were derived for estimating the chronological age. The chronological age of a South Indian individual between 14 and 22 years may be estimated based on the regression formulae. However, additional studies with a larger study population must be conducted to meet the need for population-based information on third molar development.

  3. Case-mix groups for VA hospital-based home care.

    PubMed

    Smith, M E; Baker, C R; Branch, L G; Walls, R C; Grimes, R M; Karklins, J M; Kashner, M; Burrage, R; Parks, A; Rogers, P

    1992-01-01

    The purpose of this study is to group hospital-based home care (HBHC) patients homogeneously by their characteristics with respect to cost of care to develop alternative case mix methods for management and reimbursement (allocation) purposes. Six Veterans Affairs (VA) HBHC programs in Fiscal Year (FY) 1986 that maximized patient, program, and regional variation were selected, all of which agreed to participate. All HBHC patients active in each program on October 1, 1987, in addition to all new admissions through September 30, 1988 (FY88), comprised the sample of 874 unique patients. Statistical methods include the use of classification and regression trees (CART software: Statistical Software; Lafayette, CA), analysis of variance, and multiple linear regression techniques. The resulting algorithm is a three-factor model that explains 20% of the cost variance (R2 = 20%, with a cross validation R2 of 12%). Similar classifications such as the RUG-II, which is utilized for VA nursing home and intermediate care, the VA outpatient resource allocation model, and the RUG-HHC, utilized in some states for reimbursing home health care in the private sector, explained less of the cost variance and, therefore, are less adequate for VA home care resource allocation.

  4. Risk Factors Associated with Mortality and Increased Drug Costs in Nonvariceal Upper Gastrointestinal Bleeding.

    PubMed

    Lu, Mingliang; Sun, Gang; Zhang, Xiu-li; Zhang, Xiao-mei; Liu, Qing-sen; Huang, Qi-yang; Lau, James W Y; Yang, Yun-sheng

    2015-06-01

    To determine risk factors associated with mortality and increased drug costs in patients with nonvariceal upper gastrointestinal bleeding. We retrospectively analyzed data from patients hospitalized with nonvariceal upper gastrointestinal bleeding between January 2001-December 2011. Demographic and clinical characteristics and drug costs were documented. Univariate analysis determined possible risk factors for mortality. Statistically significant variables were analyzed using a logistic regression model. Multiple linear regression analyzed factors influencing drug costs. p < 0.05 was considered statistically significant. The study included data from 627 patients. Risk factors associated with increased mortality were age > 60, systolic blood pressure<100 mmHg, lack of endoscopic examination, comorbidities, blood transfusion, and rebleeding. Drug costs were higher in patients with rebleeding, blood transfusion, and prolonged hospital stay. In this patient cohort, re-bleeding rate is 11.20% and mortality is 5.74%. The mortality risk in patients with comorbidities was higher than in patients without comorbidities, and was higher in patients requiring blood transfusion than in patients not requiring transfusion. Rebleeding was associ-ated with mortality. Rebleeding, blood transfusion, and prolonged hospital stay were associated with increased drug costs, whereas bleeding from lesions in the esophagus and duodenum was associated with lower drug costs.

  5. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  6. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  7. Decreasing Multicollinearity: A Method for Models with Multiplicative Functions.

    ERIC Educational Resources Information Center

    Smith, Kent W.; Sasaki, M. S.

    1979-01-01

    A method is proposed for overcoming the problem of multicollinearity in multiple regression equations where multiplicative independent terms are entered. The method is not a ridge regression solution. (JKS)

  8. Factors affecting match performance in professional Australian football.

    PubMed

    Sullivan, Courtney; Bilsborough, Johann C; Cianciosi, Michael; Hocking, Joel; Cordy, Justin T; Coutts, Aaron J

    2014-05-01

    To determine the physical activity measures and skill-performance characteristics that contribute to coaches' perception of performance and player performance rank in professional Australian Football (AF). Prospective, longitudinal. Physical activity profiles were assessed via microtechnology (GPS and accelerometer) from 40 professional AF players from the same team during 15 Australian Football League games. Skill-performance measure and player-rank scores (Champion Data Rank) were provided by a commercial statistical provider. The physical-performance variables, skill involvements, and individual player performance scores were expressed relative to playing time for each quarter. A stepwise multiple regression was used to examine the contribution of physical activity and skill involvements to coaches' perception of performance and player rank in AF. Stepwise multiple-regression analysis revealed that 42.2% of the variance in coaches' perception of a player's performance could be explained by the skill-performance characteristics (player rank/min, effective kicks/min, pressure points/min, handballs/min, and running bounces/ min), with a small contribution from physical activity measures (accelerations/min) (adjusted R2 = .422, F6,282 = 36.054, P < .001). Multiple regression also revealed that 66.4% of the adjusted variance in player rank could be explained by total disposals/min, effective kicks/min, pressure points/min, kick clangers/min, marks/min, speed (m/min), and peak speed (adjusted R2 = .664, F7,281 = 82.289, P < .001). Increased physical activity throughout a match (speed [m/min] β - 0.097 and peak speed β - 0.116) negatively affects player rank in AF. Skill performance rather than increased physical activity is more important to coaches' perception of performance and player rank in professional AF.

  9. [Association between hours of television watched, physical activity, sleep and excess weight among young adults].

    PubMed

    Martínez-Moyá, María; Navarrete-Muñoz, Eva M; García de la Hera, Manuela; Giménez-Monzo, Daniel; González-Palacios, Sandra; Valera-Gran, Desirée; Sempere-Orts, María; Vioque, Jesús

    2014-01-01

    To explore the association between excess weight or body mass index (BMI) and the time spent watching television, self-reported physical activity and sleep duration in a young adult population. We analyzed cross-sectional baseline data of 1,135 participants (17-35 years old) from the project Dieta, salud y antropometría en población universitaria (Diet, Health and Anthrompmetric Variables in Univeristy Students). Information about time spent watching television, sleep duration, self-reported physical activity and self-reported height and weight was provided by a baseline questionnaire. BMI was calculated as kg/m(2) and excess of weight was defined as ≥25. We used multiple logistic regression to explore the association between excess weight (no/yes) and independent variables, and multiple linear regression for BMI. The prevalence of excess weight was 13.7% (11.2% were overweight and 2.5% were obese). A significant positive association was found between excess weight and a greater amount of time spent watching television. Participants who reported watching television >2h a day had a higher risk of excess weight than those who watched television ≤1h a day (OR=2.13; 95%CI: 1.37-3.36; p-trend: 0.002). A lower level of physical activity was associated with an increased risk of excess weight, although the association was statistically significant only in multiple linear regression (p=0.037). No association was observed with sleep duration. A greater number of hours spent watching television and lower physical activity were significantly associated with a higher BMI in young adults. Both factors are potentially modifiable with preventive strategies. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.

  10. [Stature estimation for Sichuan Han nationality female based on X-ray technology with measurement of lumbar vertebrae].

    PubMed

    Qing, Si-han; Chang, Yun-feng; Dong, Xiao-ai; Li, Yuan; Chen, Xiao-gang; Shu, Yong-kang; Deng, Zhen-hua

    2013-10-01

    To establish the mathematical models of stature estimation for Sichuan Han female with measurement of lumbar vertebrae by X-ray to provide essential data for forensic anthropology research. The samples, 206 Sichuan Han females, were divided into three groups including group A, B and C according to the ages. Group A (206 samples) consisted of all ages, group B (116 samples) were 20-45 years old and 90 samples over 45 years old were group C. All the samples were examined lumbar vertebrae through CR technology, including the parameters of five centrums (L1-L5) as anterior border, posterior border and central heights (x1-x15), total central height of lumbar spine (x16), and the real height of every sample. The linear regression analysis was produced using the parameters to establish the mathematical models of stature estimation. Sixty-two trained subjects were tested to verify the accuracy of the mathematical models. The established mathematical models by hypothesis test of linear regression equation model were statistically significant (P<0.05). The standard errors of the equation were 2.982-5.004 cm, while correlation coefficients were 0.370-0.779 and multiple correlation coefficients were 0.533-0.834. The return tests of the highest correlation coefficient and multiple correlation coefficient of each group showed that the highest accuracy of the multiple regression equation, y = 100.33 + 1.489 x3 - 0.548 x6 + 0.772 x9 + 0.058 x12 + 0.645 x15, in group A were 80.6% (+/- lSE) and 100% (+/- 2SE). The established mathematical models in this study could be applied for the stature estimation for Sichuan Han females.

  11. Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

    DTIC Science & Technology

    2015-07-15

    Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

  12. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  13. A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test

    NASA Technical Reports Server (NTRS)

    Messer, Bradley

    2007-01-01

    Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.

  14. Regression modeling and prediction of road sweeping brush load characteristics from finite element analysis and experimental results.

    PubMed

    Wang, Chong; Sun, Qun; Wahab, Magd Abdel; Zhang, Xingyu; Xu, Limin

    2015-09-01

    Rotary cup brushes mounted on each side of a road sweeper undertake heavy debris removal tasks but the characteristics have not been well known until recently. A Finite Element (FE) model that can analyze brush deformation and predict brush characteristics have been developed to investigate the sweeping efficiency and to assist the controller design. However, the FE model requires large amount of CPU time to simulate each brush design and operating scenario, which may affect its applications in a real-time system. This study develops a mathematical regression model to summarize the FE modeled results. The complex brush load characteristic curves were statistically analyzed to quantify the effects of cross-section, length, mounting angle, displacement and rotational speed etc. The data were then fitted by a multiple variable regression model using the maximum likelihood method. The fitted results showed good agreement with the FE analysis results and experimental results, suggesting that the mathematical regression model may be directly used in a real-time system to predict characteristics of different brushes under varying operating conditions. The methodology may also be used in the design and optimization of rotary brush tools. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Application of Statistical Downscaling Techniques to Predict Rainfall and Its Spatial Analysis Over Subansiri River Basin of Assam, India

    NASA Astrophysics Data System (ADS)

    Barman, S.; Bhattacharjya, R. K.

    2017-12-01

    The River Subansiri is the major north bank tributary of river Brahmaputra. It originates from the range of Himalayas beyond the Great Himalayan range at an altitude of approximately 5340m. Subansiri basin extends from tropical to temperate zones and hence exhibits a great diversity in rainfall characteristics. In the Northern and Central Himalayan tracts, precipitation is scarce on account of high altitudes. On the other hand, Southeast part of the Subansiri basin comprising the sub-Himalayan and the plain tract in Arunachal Pradesh and Assam, lies in the tropics. Due to Northeast as well as Southwest monsoon, precipitation occurs in this region in abundant quantities. Particularly, Southwest monsoon causes very heavy precipitation in the entire Subansiri basin during May to October. In this study, the rainfall over Subansiri basin has been studied at 24 different locations by multiple linear and non-linear regression based statistical downscaling techniques and by Artificial Neural Network based model. APHRODITE's gridded rainfall data of 0.25˚ x 0.25˚ resolutions and climatic parameters of HadCM3 GCM of resolution 2.5˚ x 3.75˚ (latitude by longitude) have been used in this study. It has been found that multiple non-linear regression based statistical downscaling technique outperformed the other techniques. Using this method, the future rainfall pattern over the Subansiri basin has been analyzed up to the year 2099 for four different time periods, viz., 2020-39, 2040-59, 2060-79, and 2080-99 at all the 24 locations. On the basis of historical rainfall, the months have been categorized as wet months, months with moderate rainfall and dry months. The spatial changes in rainfall patterns for all these three types of months have also been analyzed over the basin. Potential decrease of rainfall in the wet months and months with moderate rainfall and increase of rainfall in the dry months are observed for the future rainfall pattern of the Subansiri basin.

  16. Predictive validity of the UK clinical aptitude test in the final years of medical school: a prospective cohort study.

    PubMed

    Husbands, Adrian; Mathieson, Alistair; Dowell, Jonathan; Cleland, Jennifer; MacKenzie, Rhoda

    2014-04-23

    The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson's correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT's role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life.

  17. Predictive validity of the UK clinical aptitude test in the final years of medical school: a prospective cohort study

    PubMed Central

    2014-01-01

    Background The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. Methods The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson’s correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Results Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Conclusions Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT’s role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life. PMID:24762134

  18. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  19. A Statistical Method for Reducing Sidelobe Clutter for the Ku-Band Precipitation Radar on Board the GPM Core Observatory

    NASA Technical Reports Server (NTRS)

    Kubota, Takuji; Iguchi, Toshio; Kojima, Masahiro; Liao, Liang; Masaki, Takeshi; Hanado, Hiroshi; Meneghini, Robert; Oki, Riko

    2016-01-01

    A statistical method to reduce the sidelobe clutter of the Ku-band precipitation radar (KuPR) of the Dual-Frequency Precipitation Radar (DPR) on board the Global Precipitation Measurement (GPM) Core Observatory is described and evaluated using DPR observations. The KuPR sidelobe clutter was much more severe than that of the Precipitation Radar on board the Tropical Rainfall Measuring Mission (TRMM), and it has caused the misidentification of precipitation. The statistical method to reduce sidelobe clutter was constructed by subtracting the estimated sidelobe power, based upon a multiple regression model with explanatory variables of the normalized radar cross section (NRCS) of surface, from the received power of the echo. The saturation of the NRCS at near-nadir angles, resulting from strong surface scattering, was considered in the calculation of the regression coefficients.The method was implemented in the KuPR algorithm and applied to KuPR-observed data. It was found that the received power from sidelobe clutter over the ocean was largely reduced by using the developed method, although some of the received power from the sidelobe clutter still remained. From the statistical results of the evaluations, it was shown that the number of KuPR precipitation events in the clutter region, after the method was applied, was comparable to that in the clutter-free region. This confirms the reasonable performance of the method in removing sidelobe clutter. For further improving the effectiveness of the method, it is necessary to improve the consideration of the NRCS saturation, which will be explored in future work.

  20. Motor excitability measurements: the influence of gender, body mass index, age and temperature in healthy controls.

    PubMed

    Casanova, I; Diaz, A; Pinto, S; de Carvalho, M

    2014-04-01

    The technique of threshold tracking to test axonal excitability gives information about nodal and internodal ion channel function. We aimed to investigate variability of the motor excitability measurements in healthy controls, taking into account age, gender, body mass index (BMI) and small changes in skin temperature. We examined the left median nerve of 47 healthy controls using the automated threshold-tacking program, QTRAC. Statistical multiple regression analysis was applied to test relationship between nerve excitability measurements and subject variables. Comparisons between genders did not find any significant difference (P>0.2 for all comparisons). Multiple regression analysis showed that motor amplitude decreases with age and temperature, stimulus-response slope decreases with age and BMI, and that accommodation half-time decrease with age and temperature. The changes related to demographic features on TRONDE protocol parameters are small and less important than in conventional nerve conduction studies. Nonetheless, our results underscore the relevance of careful temperature control, and indicate that interpretation of stimulus-response slope and accommodation half-time should take into account age and BMI. In contrast, gender is not of major relevance to axonal threshold findings in motor nerves. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  1. Factors associated with preventable infant death: a multiple logistic regression.

    PubMed

    Vidal E Silva, Sandra Maria Cunha; Tuon, Rogério Antonio; Probst, Livia Fernandes; Gondinho, Brunna Verna Castro; Pereira, Antonio Carlos; Meneghim, Marcelo de Castro; Cortellazzi, Karine Laura; Ambrosano, Glaucia Maria Bovi

    2018-01-01

    OBJECTIVE To identify and analyze factors associated with preventable child deaths. METHODS This analytical cross-sectional study had preventable child mortality as dependent variable. From a population of 34,284 live births, we have selected a systematic sample of 4,402 children who did not die compared to 272 children who died from preventable causes during the period studied. The independent variables were analyzed in four hierarchical blocks: sociodemographic factors, the characteristics of the mother, prenatal and delivery care, and health conditions of the patient and neonatal care. We performed a descriptive statistical analysis and estimated multiple hierarchical logistic regression models. RESULTS Approximatelly 35.3% of the deaths could have been prevented with the early diagnosis and treatment of diseases during pregnancy and 26.8% of them could have been prevented with better care conditions for pregnant women. CONCLUSIONS The following characteristics of the mother are determinant for the higher mortality of children before the first year of life: living in neighborhoods with an average family income lower than four minimum wages, being aged ≤ 19 years, having one or more alive children, having a child with low APGAR level at the fifth minute of life, and having a child with low birth weight.

  2. Predictors of Dropout by Female Obese Patients Treated with a Group Cognitive Behavioral Therapy to Promote Weight Loss.

    PubMed

    Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki

    2016-01-01

    To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. © 2016 S. Karger GmbH, Freiburg.

  3. The impact of green stormwater infrastructure installation on surrounding health and safety.

    PubMed

    Kondo, Michelle C; Low, Sarah C; Henning, Jason; Branas, Charles C

    2015-03-01

    We investigated the health and safety effects of urban green stormwater infrastructure (GSI) installments. We conducted a difference-in-differences analysis of the effects of GSI installments on health (e.g., blood pressure, cholesterol and stress levels) and safety (e.g., felonies, nuisance and property crimes, narcotics crimes) outcomes from 2000 to 2012 in Philadelphia, Pennsylvania. We used mixed-effects regression models to compare differences in pre- and posttreatment measures of outcomes for treatment sites (n=52) and randomly chosen, matched control sites (n=186) within multiple geographic extents surrounding GSI sites. Regression-adjusted models showed consistent and statistically significant reductions in narcotics possession (18%-27% less) within 16th-mile, quarter-mile, half-mile (P<.001), and eighth-mile (P<.01) distances from treatment sites and at the census tract level (P<.01). Narcotics manufacture and burglaries were also significantly reduced at multiple scales. Nonsignificant reductions in homicides, assaults, thefts, public drunkenness, and narcotics sales were associated with GSI installation in at least 1 geographic extent. Health and safety considerations should be included in future assessments of GSI programs. Subsequent studies should assess mechanisms of this association.

  4. Dental calculus is associated with death from heart infarction.

    PubMed

    Söder, Birgitta; Meurman, Jukka H; Söder, Per-Östen

    2014-01-01

    We studied whether the amount of dental calculus is associated with death from heart infarction in the dental infection-atherosclerosis paradigm. Participants were 1676 healthy young Swedes followed up from 1985 to 2011. At the beginning of the study all subjects underwent oral clinical examination including dental calculus registration scored with calculus index (CI). Outcome measure was cause of death classified according to WHO International Classification of Diseases. Unpaired t-test, Chi-square tests, and multiple logistic regressions were used. Of the 1676 participants, 2.8% had died during follow-up. Women died at a mean age of 61.5 years and men at 61.7 years. The difference in the CI index score between the survivors versus deceased patients was significant by the year 2009 (P < 0.01). In multiple regression analysis of the relationship between death from heart infarction as a dependent variable and CI as independent variable with controlling for age, gender, dental visits, dental plaque, periodontal pockets, education, income, socioeconomic status, and pack-years of smoking, CI score appeared to be associated with 2.3 times the odds ratio for cardiac death. The results confirmed our study hypothesis by showing that dental calculus indeed associated statistically with cardiac death due to infarction.

  5. Predictors of Dropout by Female Obese Patients Treated with a Group Cognitive Behavioral Therapy to Promote Weight Loss

    PubMed Central

    Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki

    2016-01-01

    Objective To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. Methods 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. Results 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Conclusion Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. PMID:26745715

  6. Female homicide in Rio Grande do Sul, Brazil.

    PubMed

    Leites, Gabriela Tomedi; Meneghel, Stela Nazareth; Hirakata, Vania Noemi

    2014-01-01

    This study aimed to assess the female homicide rate due to aggression in Rio Grande do Sul, Brazil, using this as a "proxy" of femicide. This was an ecological study which correlated the female homicide rate due to aggression in Rio Grande do Sul, according to the 35 microregions defined by the Brazilian Institute of Geography and Statistics (IBGE), with socioeconomic and demographic variables access and health indicators. Pearson's correlation test was performed with the selected variables. After this, multiple linear regressions were performed with variables with p < 0.20. The standardized average of female homicide rate due to aggression in the period from 2003 to 2007 was 3.1 obits per 100 thousand. After multiple regression analysis, the final model included male mortality due to aggression (p = 0.016), the percentage of hospital admissions for alcohol (p = 0.005) and the proportion of ill-defined deaths (p = 0.015). The model have an explanatory power of 39% (adjusted r2 = 0.391). The results are consistent with other studies and indicate a strong relationship between structural violence in society and violence against women, in addition to a higher incidence of female deaths in places with high alcohol hospitalization.

  7. The Impact of Green Stormwater Infrastructure Installation on Surrounding Health and Safety

    PubMed Central

    Low, Sarah C.; Henning, Jason; Branas, Charles C.

    2015-01-01

    Objectives. We investigated the health and safety effects of urban green stormwater infrastructure (GSI) installments. Methods. We conducted a difference-in-differences analysis of the effects of GSI installments on health (e.g., blood pressure, cholesterol and stress levels) and safety (e.g., felonies, nuisance and property crimes, narcotics crimes) outcomes from 2000 to 2012 in Philadelphia, Pennsylvania. We used mixed-effects regression models to compare differences in pre- and posttreatment measures of outcomes for treatment sites (n = 52) and randomly chosen, matched control sites (n = 186) within multiple geographic extents surrounding GSI sites. Results. Regression-adjusted models showed consistent and statistically significant reductions in narcotics possession (18%–27% less) within 16th-mile, quarter-mile, half-mile (P < .001), and eighth-mile (P < .01) distances from treatment sites and at the census tract level (P < .01). Narcotics manufacture and burglaries were also significantly reduced at multiple scales. Nonsignificant reductions in homicides, assaults, thefts, public drunkenness, and narcotics sales were associated with GSI installation in at least 1 geographic extent. Conclusions. Health and safety considerations should be included in future assessments of GSI programs. Subsequent studies should assess mechanisms of this association. PMID:25602887

  8. Fast Quantitative Analysis Of Museum Objects Using Laser-Induced Breakdown Spectroscopy And Multiple Regression Algorithms

    NASA Astrophysics Data System (ADS)

    Lorenzetti, G.; Foresta, A.; Palleschi, V.; Legnaioli, S.

    2009-09-01

    The recent development of mobile instrumentation, specifically devoted to in situ analysis and study of museum objects, allows the acquisition of many LIBS spectra in very short time. However, such large amount of data calls for new analytical approaches which would guarantee a prompt analysis of the results obtained. In this communication, we will present and discuss the advantages of statistical analytical methods, such as Partial Least Squares Multiple Regression algorithms vs. the classical calibration curve approach. PLS algorithms allows to obtain in real time the information on the composition of the objects under study; this feature of the method, compared to the traditional off-line analysis of the data, is extremely useful for the optimization of the measurement times and number of points associated with the analysis. In fact, the real time availability of the compositional information gives the possibility of concentrating the attention on the most `interesting' parts of the object, without over-sampling the zones which would not provide useful information for the scholars or the conservators. Some example on the applications of this method will be presented, including the studies recently performed by the researcher of the Applied Laser Spectroscopy Laboratory on museum bronze objects.

  9. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-05-01

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.

  10. Multiple-Instance Regression with Structured Data

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

    2008-01-01

    We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

  11. Generic Feature Selection with Short Fat Data

    PubMed Central

    Clarke, B.; Chu, J.-H.

    2014-01-01

    SUMMARY Consider a regression problem in which there are many more explanatory variables than data points, i.e., p ≫ n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an Lq norm with high enough q. PMID:25346546

  12. Bankfull characteristics of Ohio streams and their relation to peak streamflows

    USGS Publications Warehouse

    Sherwood, James M.; Huitger, Carrie A.

    2005-01-01

    Regional curves, simple-regression equations, and multiple-regression equations were developed to estimate bankfull width, bankfull mean depth, bankfull cross-sectional area, and bankfull discharge of rural, unregulated streams in Ohio. The methods are based on geomorphic, basin, and flood-frequency data collected at 50 study sites on unregulated natural alluvial streams in Ohio, of which 40 sites are near streamflow-gaging stations. The regional curves and simple-regression equations relate the bankfull characteristics to drainage area. The multiple-regression equations relate the bankfull characteristics to drainage area, main-channel slope, main-channel elevation index, median bed-material particle size, bankfull cross-sectional area, and local-channel slope. Average standard errors of prediction for bankfull width equations range from 20.6 to 24.8 percent; for bankfull mean depth, 18.8 to 20.6 percent; for bankfull cross-sectional area, 25.4 to 30.6 percent; and for bankfull discharge, 27.0 to 78.7 percent. The simple-regression (drainage-area only) equations have the highest average standard errors of prediction. The multiple-regression equations in which the explanatory variables included drainage area, main-channel slope, main-channel elevation index, median bed-material particle size, bankfull cross-sectional area, and local-channel slope have the lowest average standard errors of prediction. Field surveys were done at each of the 50 study sites to collect the geomorphic data. Bankfull indicators were identified and evaluated, cross-section and longitudinal profiles were surveyed, and bed- and bank-material were sampled. Field data were analyzed to determine various geomorphic characteristics such as bankfull width, bankfull mean depth, bankfull cross-sectional area, bankfull discharge, streambed slope, and bed- and bank-material particle-size distribution. The various geomorphic characteristics were analyzed by means of a combination of graphical and statistical techniques. The logarithms of the annual peak discharges for the 40 gaged study sites were fit by a Pearson Type III frequency distribution to develop flood-peak discharges associated with recurrence intervals of 2, 5, 10, 25, 50, and 100 years. The peak-frequency data were related to geomorphic, basin, and climatic variables by multiple-regression analysis. Simple-regression equations were developed to estimate 2-, 5-, 10-, 25-, 50-, and 100-year flood-peak discharges of rural, unregulated streams in Ohio from bankfull channel cross-sectional area. The average standard errors of prediction are 31.6, 32.6, 35.9, 41.5, 46.2, and 51.2 percent, respectively. The study and methods developed are intended to improve understanding of the relations between geomorphic, basin, and flood characteristics of streams in Ohio and to aid in the design of hydraulic structures, such as culverts and bridges, where stability of the stream and structure is an important element of the design criteria. The study was done in cooperation with the Ohio Department of Transportation and the U.S. Department of Transportation, Federal Highway Administration.

  13. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.

    PubMed

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.

  14. Biostatistics Series Module 10: Brief Overview of Multivariate Methods.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2017-01-01

    Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo; Craig, Tim

    Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and appliedmore » three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR weight prediction methodologies perform comparably to the LR model and can produce clinical quality treatment plans by simultaneously predicting multiple weights that capture trade-offs associated with sparing multiple OARs.« less

  16. Impacts of education level and employment status on health-related quality of life in multiple sclerosis patients.

    PubMed

    Šabanagić-Hajrić, Selma; Alajbegović, Azra

    2015-02-01

    To evaluate the impacts of education level and employment status on health-related quality of life (HRQoL) in multiple sclerosis patients. This study included 100 multiple sclerosis patients treated at the Department of Neurology, Clinical Center of the University of Sarajevo. Inclusion criteria were the Expanded Disability Status Scale (EDSS) score between 1.0 and 6.5, age between 18 and 65 years, stable disease on enrollment. Quality of life (QoL) was evaluated by the Multiple Sclerosis Quality of Life-54 questionnaire (MSQoL-54). Mann-Whitney and Kruskal-Wallis test were used for comparisons. Linear regression analyses were performed to evaluate prediction value of educational level and employment status in predicting MSQOL-54 physical and mental composite scores. Full employment status had positive impact on physical health (54.85 vs. 37.90; p les than 0.001) and mental health (59.55 vs. 45.90; p les than 0.001) composite scores. Employment status retained its independent predictability for both physical (r(2)=0.105) and mental (r(2)=0.076) composite scores in linear regression analysis. Patients with college degree had slightly higher median value of physical (49.36 vs. 45.30) and mental health composite score (66.74 vs. 55.62) comparing to others, without statistically significant difference. Employment proved to be an important factor in predicting quality of life in multiple sclerosis patients. Higher education level may determine better QOL but without significant predictive value. Sustained employment and development of vocational rehabilitation programs for MS patients living in the country with high unemployment level is an important factor in improving both physical and mental health outcomes in MS patients.

  17. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  18. Statistical methods for detecting and comparing periodic data and their application to the nycthemeral rhythm of bodily harm: A population based study

    PubMed Central

    2010-01-01

    Background Animals, including humans, exhibit a variety of biological rhythms. This article describes a method for the detection and simultaneous comparison of multiple nycthemeral rhythms. Methods A statistical method for detecting periodic patterns in time-related data via harmonic regression is described. The method is particularly capable of detecting nycthemeral rhythms in medical data. Additionally a method for simultaneously comparing two or more periodic patterns is described, which derives from the analysis of variance (ANOVA). This method statistically confirms or rejects equality of periodic patterns. Mathematical descriptions of the detecting method and the comparing method are displayed. Results Nycthemeral rhythms of incidents of bodily harm in Middle Franconia are analyzed in order to demonstrate both methods. Every day of the week showed a significant nycthemeral rhythm of bodily harm. These seven patterns of the week were compared to each other revealing only two different nycthemeral rhythms, one for Friday and Saturday and one for the other weekdays. PMID:21059197

  19. Statistically Controlling for Confounding Constructs Is Harder than You Think

    PubMed Central

    Westfall, Jacob; Yarkoni, Tal

    2016-01-01

    Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity. PMID:27031707

  20. Parametric Analysis to Study the Influence of Aerogel-Based Renders' Components on Thermal and Mechanical Performance.

    PubMed

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-05-04

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study's objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect.

  1. The mediating effect of calling on the relationship between medical school students' academic burnout and empathy.

    PubMed

    Chae, Su Jin; Jeong, So Mi; Chung, Yoon-Sok

    2017-09-01

    This study is aimed at identifying the relationships between medical school students' academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students' empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. This result demonstrates that calling is a key variable that mediates the relationship between medical students' academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students' empathy skills.

  2. Parametric Analysis to Study the Influence of Aerogel-Based Renders’ Components on Thermal and Mechanical Performance

    PubMed Central

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-01-01

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study’s objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect. PMID:28773460

  3. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  4. Investigation of Relationship between QBO and Ionospheric Neutral Temperature

    NASA Astrophysics Data System (ADS)

    Saǧır, Selçuk; Atıcı, Ramazan; Özcan, Osman

    2016-07-01

    The relationship between Quasi Biennial Oscillation (QBO) measured at 10 hPa altitude and neutral temperature obtained from NRLMSIS-00 model for 90 km altitude of ionosphere is statistically investigated. For this study, multiple-regression model is used. To see effect on neutral temperature of QBO directions, Dummy variables are added to model established. In the results of performed analysis, it is observed that QBO is effected on neutral temperature of ionosphere. It is determined that 57% of variations at neutral temperature can be explainable by QBO. According to the established model, statistical explainable ratio was determined as 1% that it is the highest rate. Also, it is seen that an increase/a decrease of 1 meter per second at QBO give rise to an increase/a decrease of 0,07 K at neutral temperature.

  5. The effect of attending tutoring on course grades in Calculus I

    NASA Astrophysics Data System (ADS)

    Rickard, Brian; Mills, Melissa

    2018-04-01

    Tutoring centres are common in universities in the United States, but there are few published studies that statistically examine the effects of tutoring on student success. This study utilizes multiple regression analysis to model the effect of tutoring attendance on final course grades in Calculus I. Our model predicted that every three visits to the tutoring centre is correlated with an increase of a students' course grade by one per cent, after controlling for prior academic ability. We also found that for lower-achieving students, attending tutoring had a greater impact on final grades.

  6. Data on the relationships between financing strategies, entrepreneurial competencies and business growth of technology-based SMEs in Nigeria.

    PubMed

    Ibidunni, Ayodotun Stephen; Kehinde, Oladele Joseph; Ibidunni, Oyebisi Mary; Olokundun, Maxwell Ayodele; Olubusayo, Falola Hezekiah; Salau, Odunayo Paul; Borishade, Taiye Tairat; Fred, Peter

    2018-06-01

    The article presents data on the relationship between financing strategies, entrepreneurial competencies and business growth of technology-based SMEs in Nigeria. Copies of structured questionnaire were administered to 233 SME owners and financial managers. Using descriptive and standard multiple regression statistical analysis, the data revealed that venture capital and business donations significantly influences profit growth of technology-based SMEs. Moreover, the data revealed that technology-`based firms can enhance their access to financing through capacity building in entrepreneurial competencies, such as acquiring the right skills and attitude.

  7. Limb-darkening and the structure of the Jovian atmosphere

    NASA Technical Reports Server (NTRS)

    Newman, W. I.; Sagan, C.

    1978-01-01

    By observing the transit of various cloud features across the Jovian disk, limb-darkening curves were constructed for three regions in the 4.6 to 5.1 mu cm band. Several models currently employed in describing the radiative or dynamical properties of planetary atmospheres are here examined to understand their implications for limb-darkening. The statistical problem of fitting these models to the observed data is reviewed and methods for applying multiple regression analysis are discussed. Analysis of variance techniques are introduced to test the viability of a given physical process as a cause of the observed limb-darkening.

  8. Comparison of Nine Statistical Model Based Warfarin Pharmacogenetic Dosing Algorithms Using the Racially Diverse International Warfarin Pharmacogenetic Consortium Cohort Database

    PubMed Central

    Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao

    2015-01-01

    Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568

  9. A Highly Efficient Design Strategy for Regression with Outcome Pooling

    PubMed Central

    Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.

    2014-01-01

    The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822

  10. A highly efficient design strategy for regression with outcome pooling.

    PubMed

    Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F

    2014-12-10

    The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.

  11. Relationship Between Column-Density and Surface Mixing Ratio: Statistical Analysis of O3 and NO2 Data from the July 2011 Maryland DISCOVER-AQ Mission

    NASA Technical Reports Server (NTRS)

    Flynn, Clare; Pickering, Kenneth E.; Crawford, James H.; Lamsol, Lok; Krotkov, Nickolay; Herman, Jay; Weinheimer, Andrew; Chen, Gao; Liu, Xiong; Szykman, James; hide

    2014-01-01

    To investigate the ability of column (or partial column) information to represent surface air quality, results of linear regression analyses between surface mixing ratio data and column abundances for O3 and NO2 are presented for the July 2011 Maryland deployment of the DISCOVER-AQ mission. Data collected by the P-3B aircraft, ground-based Pandora spectrometers, Aura/OMI satellite instrument, and simulations for July 2011 from the CMAQ air quality model during this deployment provide a large and varied data set, allowing this problem to be approached from multiple perspectives. O3 columns typically exhibited a statistically significant and high degree of correlation with surface data (R(sup 2) > 0.64) in the P- 3B data set, a moderate degree of correlation (0.16 < R(sup 2) < 0.64) in the CMAQ data set, and a low degree of correlation (R(sup 2) < 0.16) in the Pandora and OMI data sets. NO2 columns typically exhibited a low to moderate degree of correlation with surface data in each data set. The results of linear regression analyses for O3 exhibited smaller errors relative to the observations than NO2 regressions. These results suggest that O3 partial column observations from future satellite instruments with sufficient sensitivity to the lower troposphere can be meaningful for surface air quality analysis.

  12. Associations between low back pain, urinary incontinence, and abdominal muscle recruitment as assessed via ultrasonography in the elderly.

    PubMed

    Figueiredo, Vânia F; Amorim, Juleimar S C; Pereira, Aline M; Ferreira, Paulo H; Pereira, Leani S M

    2015-01-01

    Low back pain (LBP) and urinary incontinence (UI) are highly prevalent among elderly individuals. In young adults, changes in trunk muscle recruitment, as assessed via ultrasound imaging, may be associated with lumbar spine stability. To assess the associations between LBP, UI, and the pattern of transversus abdominis (TrA), internal (IO), and external oblique (EO) muscle recruitment in the elderly as evaluated by ultrasound imaging. Fifty-four elderly individuals (mean age: 72±5.2 years) who complained of LBP and/or UI as assessed by the McGill Pain Questionnaire, Incontinence Questionnaire-Short Form, and ultrasound imaging were included in the study. The statistical analysis comprised a multiple linear regression model, and a p-value <0.05 was considered significant. The regression models for the TrA, IO, and EO muscle thickness levels explained 2.0% (R2=0.02; F=0.47; p=0.628), 10.6% (R2=0.106; F=3.03; p=0.057), and 10.1% (R2=0.101; F=2.70; p=0.077) of the variability, respectively. None of the regression models developed for the abdominal muscles exhibited statistical significance. A significant and negative association (p=0.018; β=-0.0343) was observed only between UI and IO recruitment. These results suggest that age-related factors may have interfered with the findings of the study, thus emphasizing the need to perform ultrasound imaging-based studies to measure abdominal muscle recruitment in the elderly.

  13. [Research on relations among self-esteem, self-harmony and interpersonal-harmony of university students].

    PubMed

    Zhang, Hualing

    2014-03-01

    To learn characteristics and their mutual relations of self-esteem, self-harmony and interpersonal-harmony of university students, in order to provide the basis for mental health education. With a stratified cluster random sampling method, a questionnaire survey was conducted in 820 university students from 16 classes of four universities, chosen from 30 universities in Anhui Province. Meanwhile, Rosenberg Self-esteem Scale, Self-harmony Scale and Interpersonal-harmony Diagnostic Scale were used for assessment. Self-esteem of university students has an average score of (30.71 +/- 4.77), higher than median thoery 25, and there existed statistical significance in the dimensions of gender (P = 0.004), origin (P = 0.038) and only-child (P = 0.005). University students' self-harmony has an average score of (98.66 +/- 8.69), among which there were 112 students in the group of low score, counting for 13.7%, 442 in that of middle score, counting for 53.95%, 265 in that of high score, counting for 32.33%. And there existed no statistical significance in the total-score of self-harmony and score differences from most of subscales in the dimention of gender and origin, but satistical significance did exist in the dimention of only-child (P = 0.004). It was statistically significant (P = 0.006) on the "stereotype" subscales, on the differences between university students from urban areas and rural areas. Every dimension of self-esteem and self -harmony and interpersonal harmony was correlated and statistically significant. Multiple regression analysis found that when there was a variable in self-esteem, the amount of the variable of self-harmony for explaination of interpersonal conversation dropped from 22.6% to 12%, and standard regression coefficient changing from 0.087 to 0.035. The trouble of interpersonal dating fell from 27.6% to 13.1%, the standard regression coefficient changing from 0.104 to 0.019. The bother of treating people fell from 30.9% to 15%, and the standard regression coefficient changing from 0.079 to 0.020. The problem of heterosexual contact fell from 23.4% to 17.3%, and the standard regression coefficient changing from 0.095 to 0.024. Self-esteem was a mediator variable between self-harmony and interpersonal-harmony. By cultivating university students' level of self-esteem to achieve their self-harmony and interpersonal-harmony, university students' mental health level can be improved.

  14. A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

    NASA Astrophysics Data System (ADS)

    Zack, J. W.

    2015-12-01

    Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.

  15. CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

    EPA Pesticide Factsheets

    Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Penna, M.L.; Duchiade, M.P.

    This study examines the relationship between air pollution, measured as concentration of suspended particulates in the atmosphere, and infant mortality due to pneumonia in the metropolitan area of Rio de Janeiro. Multiple linear regression (progressive or stepwise method) was used to analyze infant mortality due to pneumonia, diarrhea, and all causes in 1980, by geographic area, income level, and degree of contamination. While the variable proportion of families with income equivalent to more than two minimum wages was included in the regressions corresponding to the three types of infant mortality, the average contamination index had a statistically significant coefficient (bmore » = 0.2208; t = 2.670; P = 0.0137) only in the case of mortality due to pneumonia. This would suggest a biological association, but, as in any ecological study, such conclusions should be viewed with caution. The authors believe that air quality indicators are essential to consider in studies of acute respiratory infections in developing countries.« less

  17. Quantitative analysis of aircraft multispectral-scanner data and mapping of water-quality parameters in the James River in Virginia

    NASA Technical Reports Server (NTRS)

    Johnson, R. W.; Bahn, G. S.

    1977-01-01

    Statistical analysis techniques were applied to develop quantitative relationships between in situ river measurements and the remotely sensed data that were obtained over the James River in Virginia on 28 May 1974. The remotely sensed data were collected with a multispectral scanner and with photographs taken from an aircraft platform. Concentration differences among water quality parameters such as suspended sediment, chlorophyll a, and nutrients indicated significant spectral variations. Calibrated equations from the multiple regression analysis were used to develop maps that indicated the quantitative distributions of water quality parameters and the dispersion characteristics of a pollutant plume entering the turbid river system. Results from further analyses that use only three preselected multispectral scanner bands of data indicated that regression coefficients and standard errors of estimate were not appreciably degraded compared with results from the 10-band analysis.

  18. A Model for Oil-Gas Pipelines Cost Prediction Based on a Data Mining Process

    NASA Astrophysics Data System (ADS)

    Batzias, Fragiskos A.; Spanidis, Phillip-Mark P.

    2009-08-01

    This paper addresses the problems associated with the cost estimation of oil/gas pipelines during the elaboration of feasibility assessments. Techno-economic parameters, i.e., cost, length and diameter, are critical for such studies at the preliminary design stage. A methodology for the development of a cost prediction model based on Data Mining (DM) process is proposed. The design and implementation of a Knowledge Base (KB), maintaining data collected from various disciplines of the pipeline industry, are presented. The formulation of a cost prediction equation is demonstrated by applying multiple regression analysis using data sets extracted from the KB. Following the methodology proposed, a learning context is inductively developed as background pipeline data are acquired, grouped and stored in the KB, and through a linear regression model provide statistically substantial results, useful for project managers or decision makers.

  19. ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

    PubMed

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.

  20. Dose-response relationships between internally-deposited uranium and select health outcomes in gaseous diffusion plant workers, 1948-2011.

    PubMed

    Yiin, James H; Anderson, Jeri L; Bertke, Stephen J; Tollerud, David J

    2018-05-09

    To examine dose-response relationships between internal uranium exposures and select outcomes among a cohort of uranium enrichment workers. Cox regression was conducted to examine associations between selected health outcomes and cumulative internal uranium with consideration for external ionizing radiation, work-related medical X-rays and contaminant radionuclides technetium ( 99 Tc) and plutonium ( 239 Pu) as potential confounders. Elevated and monotonically increasing mortality risks were observed for kidney cancer, chronic renal diseases, and multiple myeloma, and the association with internal uranium absorbed organ dose was statistically significant for multiple myeloma. Adjustment for potential confounders had minimal impact on the risk estimates. Kidney cancer, chronic renal disease, and multiple myeloma mortality risks were elevated with increasing internal uranium absorbed organ dose. The findings add to evidence of an association between internal exposure to uranium and cancer. Future investigation includes a study of cancer incidence in this cohort. © 2018 Wiley Periodicals, Inc.

  1. Gradients in Depressive Symptoms by Socioeconomic Position Among Men Who Have Sex With Men in the EXPLORE Study.

    PubMed

    Pakula, Basia; Marshall, Brandon D L; Shoveller, Jean A; Chesney, Margaret A; Coates, Thomas J; Koblin, Beryl; Mayer, Kenneth; Mimiaga, Matthew; Operario, Don

    2016-08-01

    This study examines gradients in depressive symptoms by socioeconomic position (SEP; i.e., income, education, employment) in a sample of men who have sex with men (MSM). Data were used from EXPLORE, a randomized, controlled behavioral HIV prevention trial for HIV-uninfected MSM in six U.S. cities (n = 4,277). Depressive symptoms were assessed using the Center for Epidemiologic Studies Depression scale (short form). Multiple linear regressions were fitted with interaction terms to assess additive and multiplicative relationships between SEP and depressive symptoms. Depressive symptoms were more prevalent among MSM with lower income, lower educational attainment, and those in the unemployed/other employment category. Income, education, and employment made significant contributions in additive models after adjustment. The employment-income interaction was statistically significant, indicating a multiplicative effect. This study revealed gradients in depressive symptoms across SEP of MSM, pointing to income and employment status and, to a lesser extent, education as key factors for understanding heterogeneity of depressive symptoms.

  2. Nonparametric methods for drought severity estimation at ungauged sites

    NASA Astrophysics Data System (ADS)

    Sadri, S.; Burn, D. H.

    2012-12-01

    The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.

  3. "Suicide shall cease to be a crime": suicide and undetermined death trends 1970-2000 before and after the decriminalization of suicide in Ireland 1993.

    PubMed

    Osman, Mugtaba; Parnell, Andrew C; Haley, Clifford

    2017-02-01

    Suicide is criminalized in more than 100 countries around the world. A dearth of research exists into the effect of suicide legislation on suicide rates and available statistics are mixed. This study investigates 10,353 suicide deaths in Ireland that took place between 1970 and 2000. Irish 1970-2000 annual suicide data were obtained from the Central Statistics Office and modelled via a negative binomial regression approach. We examined the effect of suicide legislation on different age groups and on both sexes. We used Bonferroni correction for multiple modelling. Statistical analysis was performed using the R statistical package version 3.1.2. The coefficient for the effect of suicide act on overall suicide deaths was -9.094 (95 % confidence interval (CI) -34.086 to 15.899), statistically non-significant (p = 0.476). The coefficient for the effect suicide act on undetermined deaths was statistically significant (p < 0.001) and was estimated to be -644.4 (95 % CI -818.6 to -469.9). The results of our study indicate that legalization of suicide is not associated with a significant increase in subsequent suicide deaths. However, undetermined death verdict rates have significantly dropped following legalization of suicide.

  4. Validating the absolute reliability of a fat free mass estimate equation in hemodialysis patients using near-infrared spectroscopy.

    PubMed

    Kono, Kenichi; Nishida, Yusuke; Moriyama, Yoshihumi; Taoka, Masahiro; Sato, Takashi

    2015-06-01

    The assessment of nutritional states using fat free mass (FFM) measured with near-infrared spectroscopy (NIRS) is clinically useful. This measurement should incorporate the patient's post-dialysis weight ("dry weight"), in order to exclude the effects of any change in water mass. We therefore used NIRS to investigate the regression, independent variables, and absolute reliability of FFM in dry weight. The study included 47 outpatients from the hemodialysis unit. Body weight was measured before dialysis, and FFM was measured using NIRS before and after dialysis treatment. Multiple regression analysis was used to estimate the FFM in dry weight as the dependent variable. The measured FFM before dialysis treatment (Mw-FFM), and the difference between measured and dry weight (Mw-Dw) were independent variables. We performed Bland-Altman analysis to detect errors between the statistically estimated FFM and the measured FFM after dialysis treatment. The multiple regression equation to estimate the FFM in dry weight was: Dw-FFM = 0.038 + (0.984 × Mw-FFM) + (-0.571 × [Mw-Dw]); R(2)  = 0.99). There was no systematic bias between the estimated and the measured values of FFM in dry weight. Using NIRS, FFM in dry weight can be calculated by an equation including FFM in measured weight and the difference between the measured weight and the dry weight. © 2015 The Authors. Therapeutic Apheresis and Dialysis © 2015 International Society for Apheresis.

  5. [Trend in mortality from external causes in pregnant and postpartum women and its relationship to socioeconomic factors in Colombia, 1998-2010].

    PubMed

    Salazar, Edwin; Buitrago, Carolina; Molina, Federico; Alzate, Catalina Arango

    2015-05-01

    Determine the trend in mortality from external causes in pregnant and postpartum women and its relationship to socioeconomic factors. Descriptive study, based on the official registries of deaths reported by the National Statistics Agency, 1998-2010. The trend was analyzed using Poisson regressions. Bivariate correlations and multiple linear regression models were constructed to explore the relationship between mortality and socioeconomic factors: human development index, Gini index, gross domestic product, unsatisfied basic needs, unemployment rate, poverty, extreme poverty, quality of life index, illiteracy rate, and percentage of affiliation to the Social Security System. A total of 2 223 female deaths from external causes were recorded, of which 1 429 occurred during pregnancy and 794 in the postpartum period. The gross mortality rate dropped from 30.7 per 100 000 live births plus fetal deaths in 1998 to 16.7 in 2010. A downward curve with no significant inflection points was shown in the risk of dying from this cause. The multiple linear regression model showed a correlation between mortality and extreme poverty and the illiteracy rate, suggesting that these indicators could explain 89.4% of the change in mortality from external causes in pregnant and postpartum women each year in Colombia. Mortality from external causes in pregnant and postpartum women showed a significant downward trend that may be explained by important socioeconomic changes in the country, including a decrease in extreme poverty and in the illiteracy rate.

  6. Inflammation, homocysteine and carotid intima-media thickness.

    PubMed

    Baptista, Alexandre P; Cacdocar, Sanjiva; Palmeiro, Hugo; Faísca, Marília; Carrasqueira, Herménio; Morgado, Elsa; Sampaio, Sandra; Cabrita, Ana; Silva, Ana Paula; Bernardo, Idalécio; Gome, Veloso; Neves, Pedro L

    2008-01-01

    Cardiovascular disease is the main cause of morbidity and mortality in chronic renal patients. Carotid intima-media thickness (CIMT) is one of the most accurate markers of atherosclerosis risk. In this study, the authors set out to evaluate a population of chronic renal patients to determine which factors are associated with an increase in intima-media thickness. We included 56 patients (F=22, M=34), with a mean age of 68.6 years, and an estimated glomerular filtration rate of 15.8 ml/min (calculated by the MDRD equation). Various laboratory and inflammatory parameters (hsCRP, IL-6 and TNF-alpha) were evaluated. All subjects underwent measurement of internal carotid artery intima-media thickness by high-resolution real-time B-mode ultrasonography using a 10 MHz linear transducer. Intima-media thickness was used as a dependent variable in a simple linear regression model, with the various laboratory parameters as independent variables. Only parameters showing a significant correlation with CIMT were evaluated in a multiple regression model: age (p=0.001), hemoglobin (p=00.3), logCRP (p=0.042), logIL-6 (p=0.004) and homocysteine (p=0.002). In the multiple regression model we found that age (p=0.001) and homocysteine (p=0.027) were independently correlated with CIMT. LogIL-6 did not reach statistical significance (p=0.057), probably due to the small population size. The authors conclude that age and homocysteine correlate with carotid intima-media thickness, and thus can be considered as markers/risk factors in chronic renal patients.

  7. Methods for estimating annual exceedance-probability discharges for streams in Iowa, based on data through water year 2010

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.; Veilleux, Andrea G.

    2013-01-01

    A statewide study was performed to develop regional regression equations for estimating selected annual exceedance-probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedance-probability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized least-squares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.

  8. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

  9. Detection of multiple perturbations in multi-omics biological networks.

    PubMed

    Griffin, Paula J; Zhang, Yuqing; Johnson, William Evan; Kolaczyk, Eric D

    2018-05-17

    Cellular mechanism-of-action is of fundamental concern in many biological studies. It is of particular interest for identifying the cause of disease and learning the way in which treatments act against disease. However, pinpointing such mechanisms is difficult, due to the fact that small perturbations to the cell can have wide-ranging downstream effects. Given a snapshot of cellular activity, it can be challenging to tell where a disturbance originated. The presence of an ever-greater variety of high-throughput biological data offers an opportunity to examine cellular behavior from multiple angles, but also presents the statistical challenge of how to effectively analyze data from multiple sources. In this setting, we propose a method for mechanism-of-action inference by extending network filtering to multi-attribute data. We first estimate a joint Gaussian graphical model across multiple data types using penalized regression and filter for network effects. We then apply a set of likelihood ratio tests to identify the most likely site of the original perturbation. In addition, we propose a conditional testing procedure to allow for detection of multiple perturbations. We demonstrate this methodology on paired gene expression and methylation data from The Cancer Genome Atlas (TCGA). © 2018, The International Biometric Society.

  10. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  11. Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies

    PubMed Central

    Koerner, Tess K.; Zhang, Yang

    2017-01-01

    Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers. PMID:28264422

  12. Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

    ERIC Educational Resources Information Center

    Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.

    2010-01-01

    Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…

  13. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  14. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  15. Efficiency Analysis: Enhancing the Statistical and Evaluative Power of the Regression-Discontinuity Design.

    ERIC Educational Resources Information Center

    Madhere, Serge

    An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…

  16. Applying Regression Analysis to Problems in Institutional Research.

    ERIC Educational Resources Information Center

    Bohannon, Tom R.

    1988-01-01

    Regression analysis is one of the most frequently used statistical techniques in institutional research. Principles of least squares, model building, residual analysis, influence statistics, and multi-collinearity are described and illustrated. (Author/MSE)

  17. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression

    PubMed Central

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271

  18. Estimating flood magnitude and frequency at gaged and ungaged sites on streams in Alaska and conterminous basins in Canada, based on data through water year 2012

    USGS Publications Warehouse

    Curran, Janet H.; Barth, Nancy A.; Veilleux, Andrea G.; Ourso, Robert T.

    2016-03-16

    Estimates of the magnitude and frequency of floods are needed across Alaska for engineering design of transportation and water-conveyance structures, flood-insurance studies, flood-plain management, and other water-resource purposes. This report updates methods for estimating flood magnitude and frequency in Alaska and conterminous basins in Canada. Annual peak-flow data through water year 2012 were compiled from 387 streamgages on unregulated streams with at least 10 years of record. Flood-frequency estimates were computed for each streamgage using the Expected Moments Algorithm to fit a Pearson Type III distribution to the logarithms of annual peak flows. A multiple Grubbs-Beck test was used to identify potentially influential low floods in the time series of peak flows for censoring in the flood frequency analysis.For two new regional skew areas, flood-frequency estimates using station skew were computed for stations with at least 25 years of record for use in a Bayesian least-squares regression analysis to determine a regional skew value. The consideration of basin characteristics as explanatory variables for regional skew resulted in improvements in precision too small to warrant the additional model complexity, and a constant model was adopted. Regional Skew Area 1 in eastern-central Alaska had a regional skew of 0.54 and an average variance of prediction of 0.45, corresponding to an effective record length of 22 years. Regional Skew Area 2, encompassing coastal areas bordering the Gulf of Alaska, had a regional skew of 0.18 and an average variance of prediction of 0.12, corresponding to an effective record length of 59 years. Station flood-frequency estimates for study sites in regional skew areas were then recomputed using a weighted skew incorporating the station skew and regional skew. In a new regional skew exclusion area outside the regional skew areas, the density of long-record streamgages was too sparse for regional analysis and station skew was used for all estimates. Final station flood frequency estimates for all study streamgages are presented for the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities.Regional multiple-regression analysis was used to produce equations for estimating flood frequency statistics from explanatory basin characteristics. Basin characteristics, including physical and climatic variables, were updated for all study streamgages using a geographical information system and geospatial source data. Screening for similar-sized nested basins eliminated hydrologically redundant sites, and screening for eligibility for analysis of explanatory variables eliminated regulated peaks, outburst peaks, and sites with indeterminate basin characteristics. An ordinary least‑squares regression used flood-frequency statistics and basin characteristics for 341 streamgages (284 in Alaska and 57 in Canada) to determine the most suitable combination of basin characteristics for a flood-frequency regression model and to explore regional grouping of streamgages for explaining variability in flood-frequency statistics across the study area. The most suitable model for explaining flood frequency used drainage area and mean annual precipitation as explanatory variables for the entire study area as a region. Final regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability discharge in Alaska and conterminous basins in Canada were developed using a generalized least-squares regression. The average standard error of prediction for the regression equations for the various annual exceedance probabilities ranged from 69 to 82 percent, and the pseudo-coefficient of determination (pseudo-R2) ranged from 85 to 91 percent.The regional regression equations from this study were incorporated into the U.S. Geological Survey StreamStats program for a limited area of the State—the Cook Inlet Basin. StreamStats is a national web-based geographic information system application that facilitates retrieval of streamflow statistics and associated information. StreamStats retrieves published data for gaged sites and, for user-selected ungaged sites, delineates drainage areas from topographic and hydrographic data, computes basin characteristics, and computes flood frequency estimates using the regional regression equations.

  19. Statistics in biomedical laboratory and clinical science: applications, issues and pitfalls.

    PubMed

    Ludbrook, John

    2008-01-01

    This review is directed at biomedical scientists who want to gain a better understanding of statistics: what tests to use, when, and why. In my view, even during the planning stage of a study it is very important to seek the advice of a qualified biostatistician. When designing and analyzing a study, it is important to construct and test global hypotheses, rather than to make multiple tests on the data. If the latter cannot be avoided, it is essential to control the risk of making false-positive inferences by applying multiple comparison procedures. For comparing two means or two proportions, it is best to use exact permutation tests rather then the better known, classical, ones. For comparing many means, analysis of variance, often of a complex type, is the most powerful approach. The correlation coefficient should never be used to compare the performances of two methods of measurement, or two measures, because it does not detect bias. Instead the Altman-Bland method of differences or least-products linear regression analysis should be preferred. Finally, the educational value to investigators of interaction with a biostatistician, before, during and after a study, cannot be overemphasized. (c) 2007 S. Karger AG, Basel.

  20. Meteorological Contribution to Variability in Particulate Matter Concentrations

    NASA Astrophysics Data System (ADS)

    Woods, H. L.; Spak, S. N.; Holloway, T.

    2006-12-01

    Local concentrations of fine particulate matter (PM) are driven by a number of processes, including emissions of aerosols and gaseous precursors, atmospheric chemistry, and meteorology at local, regional, and global scales. We apply statistical downscaling methods, typically used for regional climate analysis, to estimate the contribution of regional scale meteorology to PM mass concentration variability at a range of sites in the Upper Midwestern U.S. Multiple years of daily PM10 and PM2.5 data, reported by the U.S. Environmental Protection Agency (EPA), are correlated with large-scale meteorology over the region from the National Centers for Environmental Prediction (NCEP) reanalysis data. We use two statistical downscaling methods (multiple linear regression, MLR, and analog) to identify which processes have the greatest impact on aerosol concentration variability. Empirical Orthogonal Functions of the NCEP meteorological data are correlated with PM timeseries at measurement sites. We examine which meteorological variables exert the greatest influence on PM variability, and which sites exhibit the greatest response to regional meteorology. To evaluate model performance, measurement data are withheld for limited periods, and compared with model results. Preliminary results suggest that regional meteorological processes account over 50% of aerosol concentration variability at study sites.

  1. Robust biological parametric mapping: an improved technique for multimodal brain image analysis

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

    2011-03-01

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

  2. Can Meditation Influence Quality of Life, Depression, and Disease Outcome in Multiple Sclerosis? Findings from a Large International Web-Based Study

    PubMed Central

    Levin, Adam B.; Hadgkiss, Emily J.; Weiland, Tracey J.; Marck, Claudia H.; van der Meer, Dania M.; Pereira, Naresh G.; Jelinek, George A.

    2014-01-01

    Objectives. To explore the association between meditation and health related quality of life (HRQOL), depression, fatigue, disability level, relapse rates, and disease activity in a large international sample of people with multiple sclerosis (MS). Methods. Participants were invited to take part in an online survey and answer questions relating to HRQOL, depression, fatigue, disability, relapse rates, and their involvement in meditation practices. Results. Statistically and potentially clinically significant differences between those who meditated once a week or more and participants who never meditated were present for mean mental health composite (MHC) scores, cognitive function scale, and health perception scale. The MHC results remained statistically significant on multivariate regression modelling when covariates were accounted for. Physical health composite (PHC) scores were higher in those that meditated; however, the differences were probably not clinically significant. Among those who meditated, fewer screened positive for depression, but there was no relationship with fatigue or relapse rate. Those with worsened disability levels were more likely to meditate. Discussion. The study reveals a significant association between meditation, lower risk of depression, and improved HRQOL in people with MS. PMID:25477709

  3. Methods for estimating selected low-flow frequency statistics and harmonic mean flows for streams in Iowa

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.

    2017-01-01

    A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.

  4. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    ERIC Educational Resources Information Center

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  5. ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting

    PubMed Central

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561

  6. Predictors of psychological distress after diagnosis in breast cancer patients and patients with benign breast problems.

    PubMed

    Ando, Noriko; Iwamitsu, Yumi; Kuranami, Masaru; Okazaki, Shigemi; Nakatani, Yuki; Yamamoto, Kenji; Watanabe, Masahiko; Miyaoka, Hitoshi

    2011-01-01

    The objective of this study was to determine how age and psychological characteristics assessed prior to diagnosis could predict psychological distress in outpatients immediately after disclosure of their diagnosis. This is a longitudinal and prospective study, and participants were breast cancer patients and patients with benign breast problems (BBP). Patients were asked to complete questionnaires to determine levels of the following: trait anxiety (State-Trait Anxiety Inventory), negative emotional suppression (Courtauld Emotional Control Scale), life stress events (Life Experiences Survey), and psychological distress (Profile of Mood Status) prior to diagnosis. They were asked to complete a questionnaire measuring psychological distress after being told their diagnosis. We analyzed a total of 38 women diagnosed with breast cancer and 95 women diagnosed with a BBP. A two-way analysis of variance (prior to, after diagnosis × cancer, benign) showed that psychological distress after diagnosis among breast cancer patients was significantly higher than in patients with a BBP. The multiple regression model accounted for a significant amount of variance in the breast cancer group (model adjusted R(2) = 0.545, p < 0.001), and only trait anxiety was statistically significant (β = 0.778, p < 0.001). In the BBP group, the multiple regression analysis yielded a significant result (model adjusted R(2) = 0.462, p < 0.001), with trait anxiety and negative life changes as statistically significant factors (β = 0.449 and 0.324 respectively; p < 0.01). In both groups, trait anxiety assessed prior to diagnosis was the significant predictor of psychological distress after diagnosis, and might have prospects as a screening method for psychologically vulnerable women. Copyright © 2011 The Academy of Psychosomatic Medicine. Published by Elsevier Inc. All rights reserved.

  7. Prevalence and of smoking and associated factors among Malaysian University students.

    PubMed

    Al-Naggar, Redhwan Ahmed; Al-Dubai, Sami Abdo Radman; Al-Naggar, Thekra Hamoud; Chen, Robert; Al-Jashamy, Karim

    2011-01-01

    The objectives were to determine the prevalence and associated factors for smoking among university students in Malaysia. A cross-sectional study was conducted among 199 students in the period from December of academic year 2009 until April of academic year 2010 in Management and Science University (MSU), Shah Alam, Selangor, Malaysia. The questionnaire was distributed randomly to all faculties of MSU by choosing one of every 3 lecture rooms, as well as the library and cafeterias of the campus randomly by choosing one from every 3 tables. Questions concerned socio-demographic variables, knowledge, attitudes and practice toward smoking. Participant's consent was obtained and ethical approval was provided by the ethics committee of the University. Data entry and analysis were performed using descriptive statistics, chi square test, Student t- test and logistic multiple regression with the SPSS version 13.0, statistical significance being concluded at p < 0.05. About one third of students were smokers (29%). The most important reason of smoking was stress (20%) followed by 'influenced by friends' (16 %). Prevalence of smoking was significantly higher among male and those in advanced semesters (p = >0.001, p = 0.047). Smokers had low level of knowledge (p < 0.05), had wrong beliefs on smoking (p < 0.05), and negative attitude toward tobacco control policies compared to non smokers (p < 0.05). On multiple logistic regression, significant predictors of smoking in the model were gender (p = 0.025), age (p = 0.037), semester of study (p = 0.025) and attitude toward smoking (p < 0.001). This study found that 29% of university students were smokers. Males and students in advanced semesters were more likely to smoke. The results provide baseline data to develop an anti-smoking program to limit smoking in the university by implementing policies against smoking.

  8. Assessing the Relationship between Airlines' Maintenance Outsourcing and Aviation Professionals' Job Satisfaction

    NASA Astrophysics Data System (ADS)

    McCamey, Rotorua

    The current economic and security challenges placed an additional burden on U.S. airlines to provide optimum service at reasonable costs to the flying public. In efforts to stay competitive, U.S. airlines increased foreign-based outsourcing of aircraft major repair and overhaul (MRO) mainly to reduce labor costs and conserve capital. This concentrated focus on outsourcing and restructuring, ignored job dissatisfaction among remaining employees which could reduce and or eliminate an airline's competitiveness. The purpose of this quantitative study was (a) to assess the relationship between increased levels of foreign-based MRO outsourcing and aviation professionals' job satisfaction (Y1); (b) to assess the influence of increased levels of foreign-based outsourcing on MRO control (Y2), MRO error rate (Y3), and MRO technical punctuality (Y4) as perceived by aviation professionals; and (c) to assess the influence of increased levels of foreign-based MRO outsourcing on technical skills (Y5) and morale ( Y6) as perceived by aviation professionals. The survey instrument was utilized based on Paul Spector's Job Satisfaction Questionnaire and MRO specific questions. A random sample of 300 U.S. airline participants was requested via MarketTools to meet required sample size of 110 as determined through a priori power analysis. Study data rendered 198 useable surveys of 213 total responses, and correlation, multiple regression, and ANOVA methods were used to test study hypotheses. The Spearman's rho for (Y 1) was statistically significant, p = .010 and multiple regression was statistically significant, p < .001. A one-way ANOVA indicated participants differed in their opinions of (Y2) through (Y6), Recommendations for future research include contrasting domestic and global MRO providers, and examining global aircraft parts suppliers and aviation technical training.

  9. Combining ultrasonography and noncontrast helical computerized tomography to evaluate Holmium laser lithotripsy

    PubMed Central

    Mi, Jia; Li, Jie; Zhang, Qinglu; Wang, Xing; Liu, Hongyu; Cao, Yanlu; Liu, Xiaoyan; Sun, Xiao; Shang, Mengmeng; Liu, Qing

    2016-01-01

    Abstract The purpose of the study was to establish a mathematical model for correlating the combination of ultrasonography and noncontrast helical computerized tomography (NCHCT) with the total energy of Holmium laser lithotripsy. In this study, from March 2013 to February 2014, 180 patients with single urinary calculus were examined using ultrasonography and NCHCT before Holmium laser lithotripsy. The calculus location and size, acoustic shadowing (AS) level, twinkling artifact intensity (TAI), and CT value were all documented. The total energy of lithotripsy (TEL) and the calculus composition were also recorded postoperatively. Data were analyzed using Spearman's rank correlation coefficient, with the SPSS 17.0 software package. Multiple linear regression was also used for further statistical analysis. A significant difference in the TEL was observed between renal calculi and ureteral calculi (r = –0.565, P < 0.001), and there was a strong correlation between the calculus size and the TEL (r = 0.675, P < 0.001). The difference in the TEL between the calculi with and without AS was highly significant (r = 0.325, P < 0.001). The CT value of the calculi was significantly correlated with the TEL (r = 0.386, P < 0.001). A correlation between the TAI and TEL was also observed (r = 0.391, P < 0.001). Multiple linear regression analysis revealed that the location, size, and TAI of the calculi were related to the TEL, and the location and size were statistically significant predictors (adjusted r2 = 0.498, P < 0.001). A mathematical model correlating the combination of ultrasonography and NCHCT with TEL was established; this model may provide a foundation to guide the use of energy in Holmium laser lithotripsy. The TEL can be estimated by the location, size, and TAI of the calculus. PMID:27930563

  10. Radiographic assessment of third molars development and it's relation to dental and chronological age in an Iranian population

    PubMed Central

    Monirifard, Mohamad; Yaraghi, Navid; Vali, Ava; Vali, Asana; Vali, Amrita

    2015-01-01

    Background: The aim of the present study was to estimate chronological age based on third molar development and to determine the association between dental age and third molar calcification stages. Materials and Methods: In this cross-sectional study, 505 digital panoramic radiographs of 223 males (44.2%) and 282 females (55.8%) between the age of 6 and 17 were selected from patients who were treated in Departments of Pediatrics and Orthodontics of Isfahan University of Medical Sciences between the years of 2009 and 2013. Correlation between chronological age and third molar development was analyzed with SPSS 21 using Spearman's Rank correlation coefficient, Chi-square test and multiple regression statistical tests (P < 0.05). Results: All third molars demonstrated a highly significant correlation with dental age (P < 0.001). The teeth showing the highest relationship with dental age were mandibular left third molar in males and mandibular right third molar in females (rs = 0.072). When multiple regression was used to predict dental age based on molar calcification stage, the only significant correlation was between maxillary left third molar in males (P < 0.05). There was no statistically significant correlation for any of third molars in females. Relationship between chronological age and molars development stage was significant in all age subgroups and in both gender (P < 0.001). Conclusion: Strong correlation was observed between left third molars and dental age in males. Results showed that third molar calcification stage can be used as an age predictor and in general mandibular teeth seems to be more reliable for this purpose in both genders and in all ages. PMID:25709677

  11. Personality, Driving Behavior and Mental Disorders Factors as Predictors of Road Traffic Accidents Based on Logistic Regression.

    PubMed

    Alavi, Seyyed Salman; Mohammadi, Mohammad Reza; Souri, Hamid; Mohammadi Kalhori, Soroush; Jannatifard, Fereshteh; Sepahbodi, Ghazal

    2017-01-01

    The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran) during 2013-2015. The Manchester driving behavior questionnaire (MDBQ), big five personality test (NEO personality inventory) and semi-structured interview (schizophrenia and affective disorders scale) were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR) of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004). It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009), but other personality factors did not have a significant effect on the equation. The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver's license.

  12. The influence of daily stress and resilience on successful ageing.

    PubMed

    Byun, J; Jung, D

    2016-09-01

    The aim of this study was to identify the effects of daily stress and resilience on successful ageing among community-dwelling older adults. Ageing can be a positive experience if there is good adaptation to ageing processes. Positive ageing needs to be a basis of nursing care, health promotion and education within community settings. Data were collected in March and April of 2014 from 262 older adults living in Seoul and Jeju, South Korea. We used a four-part survey consisting of demographic data, daily stress, resilience and successful ageing scales, in total 91 items. Data were analysed using descriptive statistics, t-test, one-way ANOVA, Tukey HSD test, Pearson's correlation coefficient and hierarchical multiple regression analysis to identify the influence of variables on successful ageing. Successful ageing had a significant negative correlation with daily stress and a positive correlation with resilience. Daily stress had a negative correlation with resilience. Findings of hierarchical multiple regression analysis indicated that resilience and subjective economic status had an effect on successful ageing. Furthermore, these variables accounted for 41.6% of the variance in successful ageing. Data were collected in only two cities of Korea based on convenience sampling. The findings of the study suggest that daily stress and resilience have a statistically significant relationship with successful ageing. Furthermore, resilience is an important influential factor and a much-needed personal characteristic for one's successful ageing. Nurses can advocate joining with health and social policy makers to implement policies on healthy ageing, including evaluation of stress, education programmes and implementation of self-help groups to enhance resilience in older people. © 2016 International Council of Nurses.

  13. Personality, Driving Behavior and Mental Disorders Factors as Predictors of Road Traffic Accidents Based on Logistic Regression

    PubMed Central

    Alavi, Seyyed Salman; Mohammadi, Mohammad Reza; Souri, Hamid; Mohammadi Kalhori, Soroush; Jannatifard, Fereshteh; Sepahbodi, Ghazal

    2017-01-01

    Background: The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. Methods: In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran) during 2013-2015. The Manchester driving behavior questionnaire (MDBQ), big five personality test (NEO personality inventory) and semi-structured interview (schizophrenia and affective disorders scale) were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. Results: In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR) of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004). It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009), but other personality factors did not have a significant effect on the equation. Conclusion: The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver’s license. PMID:28293047

  14. Empowerment of women and its association with the health of the community.

    PubMed

    Varkey, Prathibha; Mbbs; Kureshi, Sarah; Lesnick, Timothy

    2010-01-01

    Empowerment and opportunities to experience power and control in one's life contribute to health and wellness. Although studies have assessed specific factors related to women's empowerment and their influence on health outcomes, there is a dearth of published literature assessing the relationship of the empowerment of women with the overall health of a community. By means of this article, we aim to assess the relationship of women's empowerment with health in 75 countries. We used the gender empowerment measure (GEM), a composite index measuring gender inequality in economic participation and decision making, political participation and decision making, and power over economic resources. All 75 countries with GEM values in the 2006 Human Development Report (HDR) were included in the study. Association between the GEM values and seven health indicators was evaluated using descriptive statistics, scatter plots, and simple and multiple linear regression models. We also controlled for gross domestic product (GDP) as a possible confounding factor and included this variable in the multiple regression models. When GDP was not considered, GEM had a statistically significant association with all health indicator variables except for proportion of 1-year-olds immunized against measles (correlation coefficient 0.063, p = 0.597). After adjusting for GDP, GEM was significantly associated with low birth weight, fertility rate, infant mortality, and age

  15. Uterine Artery Embolization in 101 Cases of Uterine Fibroids: Do Size, Location, and Number of Fibroids Affect Therapeutic Success and Complications?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Firouznia, Kavous, E-mail: k_firouznia@yahoo.com; Ghanaati, Hossein; Sanaati, Mina

    The purpose of this study was to evaluate whether the size, location, or number of fibroids affects therapeutic efficacy or complications of uterine artery embolization (UAE). Patients with symptomatic uterine fibroids (n = 101) were treated by selective bilateral UAE using 500- to 710-{mu}m polyvinyl alcohol (PVA) particles. Baseline measures of clinical symptoms, sonography, and MRI taken before the procedure were compared to those taken 1, 3, 6, and 12 months later. Complications and outcomes were analyzed for associations with fibroid size, location, and number. Reductions in mean fibroid volume were similar in patients with single (66.6 {+-} 21.5%) andmore » multiple (67.4 {+-} 25.0%) fibroids (p-value = 0.83). Menstrual improvement occurred in patients with single (93.3%) and multiple (72.2%) fibroids (p = 0.18). Changes in submucosal and other fibroids were not significantly different between the two groups (p's > 0.56). Linear regression analysis between primary fibroid volume as independent variable and percentage reduction of fibroid volume after 1 year yielded an R{sup 2} of 0.083 and the model coefficient was not statistically significant (p = 0.072). Multivariate regression models revealed no statistically or clinically significant coefficients or odds ratios for three independent variables (primary fibroid size, total number, and fibroid location) and all outcome variables (percent reduction of uterus and fibroid volumes in 1 year, improvement of clinical symptoms [menstrual, bulk related, and urinary] in 1 year, and complications after UAE). In conclusion, neither the success rate nor the probability of complications was affected by the primary fibroid size, location, or total number of fibroids.« less

  16. A study of home deaths in Japan from 1951 to 2002

    PubMed Central

    Yang, Limin; Sakamoto, Naoko; Marui, Eiji

    2006-01-01

    Background Several surveys in Japan have indicated that most terminally ill Japanese patients would prefer to die at home or in a homelike setting. However, there is a great disparity between this stated preference and the reality, since most Japanese die in hospital. We report here national changes in home deaths in Japan over the last 5 decades. Using prefecture data, we also examined the factors in the medical service associated with home death in Japan. Methods Published data on place of death was obtained from the vital statistics compiled by the Ministry of Health, Labor and Welfare of Japan. We analyzed trends of home deaths from 1951 to 2002, and describe the changes in the proportion of home deaths by region, sex, age, and cause of death. Joinpoint regression analysis was used for trend analysis. Logistic regression analysis was performed to identify secular trends in home deaths, and the impact of age, sex, year of deaths and cause of deaths on home death. We also examined the association between home death and medical service factors by multiple regression analysis, using home death rate by prefectures in 2002 as a dependent variable. Results A significant decrease in the percentage of patients dying at home was observed in the results of joinpoint regression analysis. Older patients and males were more likely to die at home. Patients who died from cancer were less likely to die at home. The results of multiple regression analysis indicated that home death was related to the number of beds in hospital, ratio of daily occupied beds in general hospital, the number of families in which the elderly were living alone, and dwelling rooms. Conclusion The pattern of the place of death has not only been determined by social and demographic characteristics of the decedent, but also associated with the medical service in the community. PMID:16524485

  17. Sample size determination for logistic regression on a logit-normal distribution.

    PubMed

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  18. Estimating Flow-Duration and Low-Flow Frequency Statistics for Unregulated Streams in Oregon

    USGS Publications Warehouse

    Risley, John; Stonewall, Adam J.; Haluska, Tana

    2008-01-01

    Flow statistical datasets, basin-characteristic datasets, and regression equations were developed to provide decision makers with surface-water information needed for activities such as water-quality regulation, water-rights adjudication, biological habitat assessment, infrastructure design, and water-supply planning and management. The flow statistics, which included annual and monthly period of record flow durations (5th, 10th, 25th, 50th, and 95th percent exceedances) and annual and monthly 7-day, 10-year (7Q10) and 7-day, 2-year (7Q2) low flows, were computed at 466 streamflow-gaging stations at sites with unregulated flow conditions throughout Oregon and adjacent areas of neighboring States. Regression equations, created from the flow statistics and basin characteristics of the stations, can be used to estimate flow statistics at ungaged stream sites in Oregon. The study area was divided into 10 regression modeling regions based on ecological, topographic, geologic, hydrologic, and climatic criteria. In total, 910 annual and monthly regression equations were created to predict the 7 flow statistics in the 10 regions. Equations to predict the five flow-duration exceedance percentages and the two low-flow frequency statistics were created with Ordinary Least Squares and Generalized Least Squares regression, respectively. The standard errors of estimate of the equations created to predict the 5th and 95th percent exceedances had medians of 42.4 and 64.4 percent, respectively. The standard errors of prediction of the equations created to predict the 7Q2 and 7Q10 low-flow statistics had medians of 51.7 and 61.2 percent, respectively. Standard errors for regression equations for sites in western Oregon were smaller than those in eastern Oregon partly because of a greater density of available streamflow-gaging stations in western Oregon than eastern Oregon. High-flow regression equations (such as the 5th and 10th percent exceedances) also generally were more accurate than the low-flow regression equations (such as the 95th percent exceedance and 7Q10 low-flow statistic). The regression equations predict unregulated flow conditions in Oregon. Flow estimates need to be adjusted if they are used at ungaged sites that are regulated by reservoirs or affected by water-supply and agricultural withdrawals if actual flow conditions are of interest. The regression equations are installed in the USGS StreamStats Web-based tool (http://water.usgs.gov/osw/streamstats/index.html, accessed July 16, 2008). StreamStats provides users with a set of annual and monthly flow-duration and low-flow frequency estimates for ungaged sites in Oregon in addition to the basin characteristics for the sites. Prediction intervals at the 90-percent confidence level also are automatically computed.

  19. A cross-sectional study examining multiple mobility device use and fall status among middle-aged and older adults with multiple sclerosis.

    PubMed

    Finlayson, Marcia L; Peterson, Elizabeth W; Asano, Miho

    2014-01-01

    To document the prevalence of multiple mobility device use among adults with multiple sclerosis (MS) (≥ 55 years) and examine the association between falls status (faller/non-faller) and the number of mobility devices used. Cross-sectional data generated through telephone interviews with 353 participants was used for this secondary analysis. Descriptive statistics were used to address the first study purpose. Multiple device use was measured by the number of devices used, which ranged from 0 (never use a cane, walker, manual wheelchair, or power wheelchair/scooter) to 4 (use all four mobility devices at least some of the time). Logistic regression analysis was used to address the second purpose, with fall status used as the dependent variable (non-fallers [<1 per year] versus fallers [≥ 1 per year]). Just under 60% of participants reported the use of at least two mobility devices. For each additional mobility device used, the odds of being a faller increased by 1.47 times (95% CI = 1.14-1.90). Multiple mobility device use was common and the greater number of devices used, the greater the likelihood of being a faller. To prevent falls, this association requires further research to determine directionality.

  20. Stepwise versus Hierarchical Regression: Pros and Cons

    ERIC Educational Resources Information Center

    Lewis, Mitzi

    2007-01-01

    Multiple regression is commonly used in social and behavioral data analysis. In multiple regression contexts, researchers are very often interested in determining the "best" predictors in the analysis. This focus may stem from a need to identify those predictors that are supportive of theory. Alternatively, the researcher may simply be interested…

  1. Using Statistics and Data Mining Approaches to Analyze Male Sexual Behaviors and Use of Erectile Dysfunction Drugs Based on Large Questionnaire Data.

    PubMed

    Qiao, Zhi; Li, Xiang; Liu, Haifeng; Zhang, Lei; Cao, Junyang; Xie, Guotong; Qin, Nan; Jiang, Hui; Lin, Haocheng

    2017-01-01

    The prevalence of erectile dysfunction (ED) has been extensively studied worldwide. Erectile dysfunction drugs has shown great efficacy in preventing male erectile dysfunction. In order to help doctors know drug taken preference of patients and better prescribe, it is crucial to analyze who actually take erectile dysfunction drugs and the relation between sexual behaviors and drug use. Existing clinical studies usually used descriptive statistics and regression analysis based on small volume of data. In this paper, based on big volume of data (48,630 questionnaires), we use data mining approaches besides statistics and regression analysis to comprehensively analyze the relation between male sexual behaviors and use of erectile dysfunction drugs for unravelling the characteristic of patients who take erectile dysfunction drugs. We firstly analyze the impact of multiple sexual behavior factors on whether to use the erectile dysfunction drugs. Then, we explore to mine the Decision Rules for Stratification to discover patients who are more likely to take drugs. Based on the decision rules, the patients can be partitioned into four potential groups for use of erectile dysfunction: high potential group, intermediate potential-1 group, intermediate potential-2 group and low potential group. Experimental results show 1) the sexual behavior factors, erectile hardness and time length to prepare (how long to prepares for sexual behaviors ahead of time), have bigger impacts both in correlation analysis and potential drug taking patients discovering; 2) odds ratio between patients identified as low potential and high potential was 6.098 (95% confidence interval, 5.159-7.209) with statistically significant differences in taking drug potential detected between all potential groups.

  2. Explaining nitrate pollution pressure on the groundwater resource in Kinshasa using a multivariate statistical modelling approach

    NASA Astrophysics Data System (ADS)

    Mfumu Kihumba, Antoine; Vanclooster, Marnik

    2013-04-01

    Drinking water in Kinshasa, the capital of the Democratic Republic of Congo, is provided by extracting groundwater from the local aquifer, particularly in peripheral areas. The exploited groundwater body is mainly unconfined and located within a continuous detrital aquifer, primarily composed of sedimentary formations. However, the aquifer is subjected to an increasing threat of anthropogenic pollution pressure. Understanding the detailed origin of this pollution pressure is important for sustainable drinking water management in Kinshasa. The present study aims to explain the observed nitrate pollution problem, nitrate being considered as a good tracer for other pollution threats. The analysis is made in terms of physical attributes that are readily available using a statistical modelling approach. For the nitrate data, use was made of a historical groundwater quality assessment study, for which the data were re-analysed. The physical attributes are related to the topography, land use, geology and hydrogeology of the region. Prior to the statistical modelling, intrinsic and specific vulnerability for nitrate pollution was assessed. This vulnerability assessment showed that the alluvium area in the northern part of the region is the most vulnerable area. This area consists of urban land use with poor sanitation. Re-analysis of the nitrate pollution data demonstrated that the spatial variability of nitrate concentrations in the groundwater body is high, and coherent with the fragmented land use of the region and the intrinsic and specific vulnerability maps. For the statistical modeling use was made of multiple regression and regression tree analysis. The results demonstrated the significant impact of land use variables on the Kinshasa groundwater nitrate pollution and the need for a detailed delineation of groundwater capture zones around the monitoring stations. Key words: Groundwater , Isotopic, Kinshasa, Modelling, Pollution, Physico-chemical.

  3. Model for predicting the injury severity score.

    PubMed

    Hagiwara, Shuichi; Oshima, Kiyohiro; Murata, Masato; Kaneko, Minoru; Aoki, Makoto; Kanbe, Masahiko; Nakamura, Takuro; Ohyama, Yoshio; Tamura, Jun'ichi

    2015-07-01

    To determine the formula that predicts the injury severity score from parameters that are obtained in the emergency department at arrival. We reviewed the medical records of trauma patients who were transferred to the emergency department of Gunma University Hospital between January 2010 and December 2010. The injury severity score, age, mean blood pressure, heart rate, Glasgow coma scale, hemoglobin, hematocrit, red blood cell count, platelet count, fibrinogen, international normalized ratio of prothrombin time, activated partial thromboplastin time, and fibrin degradation products, were examined in those patients on arrival. To determine the formula that predicts the injury severity score, multiple linear regression analysis was carried out. The injury severity score was set as the dependent variable, and the other parameters were set as candidate objective variables. IBM spss Statistics 20 was used for the statistical analysis. Statistical significance was set at P  < 0.05. To select objective variables, the stepwise method was used. A total of 122 patients were included in this study. The formula for predicting the injury severity score (ISS) was as follows: ISS = 13.252-0.078(mean blood pressure) + 0.12(fibrin degradation products). The P -value of this formula from analysis of variance was <0.001, and the multiple correlation coefficient (R) was 0.739 (R 2  = 0.546). The multiple correlation coefficient adjusted for the degrees of freedom was 0.538. The Durbin-Watson ratio was 2.200. A formula for predicting the injury severity score in trauma patients was developed with ordinary parameters such as fibrin degradation products and mean blood pressure. This formula is useful because we can predict the injury severity score easily in the emergency department.

  4. Neonatal Risk Factors for Treatment-Demanding Retinopathy of Prematurity: A Danish National Study.

    PubMed

    Slidsborg, Carina; Jensen, Aksel; Forman, Julie Lyng; Rasmussen, Steen; Bangsgaard, Regitze; Fledelius, Hans Callø; Greisen, Gorm; la Cour, Morten

    2016-04-01

    One goal of the study was to identify "new" statistically independent risk factors for treatment-demanding retinopathy of prematurity (ROP). Another goal was to evaluate whether any new risk factors could explain the increase in the incidence of treatment-demanding ROP over time in Denmark. A retrospective, register-based cohort study. The study included premature infants (n = 6490) born in Denmark from 1997 to 2008. The study sample and the 31 candidate risk factors were identified in 3 national registers. Data were linked through a unique civil registration number. Each of the 31 candidate risk factors were evaluated in univariate analyses, while adjusted for known risk factors (i.e., gestational age [GA] at delivery, small for gestational age [SGA], multiple births, and male sex). Significant outcomes were analyzed thereafter in a backward selection multiple logistic regression model. Treatment-demanding ROP and its associations to candidate risk factors. Mechanical ventilation (odds ratio [OR], 2.84; 95% confidence interval [CI], 1.99-4.08; P < 0.01) and blood transfusion (OR, 1.97; 95% CI, 1.20-3.14; P = 0.01) were the only new statistically independent risk factors, in addition to GA at delivery, SGA, multiple births, and male sex. Modification in these prognostic factors for ROP did not cause an increase in treatment-demanding ROP. In a large study population, blood transfusion and mechanical ventilation were the only new statistically independent risk factors to predict the development of treatment-demanding ROP. Modification in the neonatal treatment with mechanical ventilation or blood transfusion did not cause the observed increase in the incidence of preterm infants with treatment-demanding ROP during a recent birth period (2003-2008). Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  5. Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence.

    PubMed

    Liu, Gang; Mukherjee, Bhramar; Lee, Seunggeun; Lee, Alice W; Wu, Anna H; Bandera, Elisa V; Jensen, Allan; Rossing, Mary Anne; Moysich, Kirsten B; Chang-Claude, Jenny; Doherty, Jennifer A; Gentry-Maharaj, Aleksandra; Kiemeney, Lambertus; Gayther, Simon A; Modugno, Francesmary; Massuger, Leon; Goode, Ellen L; Fridley, Brooke L; Terry, Kathryn L; Cramer, Daniel W; Ramus, Susan J; Anton-Culver, Hoda; Ziogas, Argyrios; Tyrer, Jonathan P; Schildkraut, Joellen M; Kjaer, Susanne K; Webb, Penelope M; Ness, Roberta B; Menon, Usha; Berchuck, Andrew; Pharoah, Paul D; Risch, Harvey; Pearce, Celeste Leigh

    2018-02-01

    There have been recent proposals advocating the use of additive gene-environment interaction instead of the widely used multiplicative scale, as a more relevant public health measure. Using gene-environment independence enhances statistical power for testing multiplicative interaction in case-control studies. However, under departure from this assumption, substantial bias in the estimates and inflated type I error in the corresponding tests can occur. In this paper, we extend the empirical Bayes (EB) approach previously developed for multiplicative interaction, which trades off between bias and efficiency in a data-adaptive way, to the additive scale. An EB estimator of the relative excess risk due to interaction is derived, and the corresponding Wald test is proposed with a general regression setting under a retrospective likelihood framework. We study the impact of gene-environment association on the resultant test with case-control data. Our simulation studies suggest that the EB approach uses the gene-environment independence assumption in a data-adaptive way and provides a gain in power compared with the standard logistic regression analysis and better control of type I error when compared with the analysis assuming gene-environment independence. We illustrate the methods with data from the Ovarian Cancer Association Consortium. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Methods for estimating selected spring and fall low-flow frequency statistics for ungaged stream sites in Iowa, based on data through June 2014

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.

    2016-09-19

    A statewide study was led to develop regression equations for estimating three selected spring and three selected fall low-flow frequency statistics for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include spring (April through June) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and fall (October through December) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years. Estimates of the three selected spring statistics are provided for 241 U.S. Geological Survey continuous-record streamgages, and estimates of the three selected fall statistics are provided for 238 of these streamgages, using data through June 2014. Because only 9 years of fall streamflow record were available, three streamgages included in the development of the spring regression equations were not included in the development of the fall regression equations. Because of regulation, diversion, or urbanization, 30 of the 241 streamgages were not included in the development of the regression equations. The study area includes Iowa and adjacent areas within 50 miles of the Iowa border. Because trend analyses indicated statistically significant positive trends when considering the period of record for most of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. Geographic information system software was used to measure 63 selected basin characteristics for each of the 211streamgages used to develop the regional regression equations. The study area was divided into three low-flow regions that were defined in a previous study for the development of regional regression equations.Because several streamgages included in the development of regional regression equations have estimates of zero flow calculated from observed streamflow for selected spring and fall low-flow frequency statistics, the final equations for the three low-flow regions were developed using two types of regression analyses—left-censored and generalized-least-squares regression analyses. A total of 211 streamgages were included in the development of nine spring regression equations—three equations for each of the three low-flow regions. A total of 208 streamgages were included in the development of nine fall regression equations—three equations for each of the three low-flow regions. A censoring threshold was used to develop 15 left-censored regression equations to estimate the three fall low-flow frequency statistics for each of the three low-flow regions and to estimate the three spring low-flow frequency statistics for the southern and northwest regions. For the northeast region, generalized-least-squares regression was used to develop three equations to estimate the three spring low-flow frequency statistics. For the northeast region, average standard errors of prediction range from 32.4 to 48.4 percent for the spring equations and average standard errors of estimate range from 56.4 to 73.8 percent for the fall equations. For the northwest region, average standard errors of estimate range from 58.9 to 62.1 percent for the spring equations and from 83.2 to 109.4 percent for the fall equations. For the southern region, average standard errors of estimate range from 43.2 to 64.0 percent for the spring equations and from 78.1 to 78.7 percent for the fall equations.The regression equations are applicable only to stream sites in Iowa with low flows not substantially affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. The regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system application. StreamStats allows users to click on any ungaged stream site and compute estimates of the six selected spring and fall low-flow statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged site are provided. StreamStats also allows users to click on any Iowa streamgage to obtain computed estimates for the six selected spring and fall low-flow statistics.

  7. A Method of Calculating Functional Independence Measure at Discharge from Functional Independence Measure Effectiveness Predicted by Multiple Regression Analysis Has a High Degree of Predictive Accuracy.

    PubMed

    Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru

    2017-09-01

    Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.

  8. Demirjian's method in the estimation of age: A study on human third molars

    PubMed Central

    Lewis, Amitha J.; Boaz, Karen; Nagesh, K. R; Srikant, N; Gupta, Neha; Nandita, K. P; Manaktala, Nidhi

    2015-01-01

    Aim: The primary aim of the following study is to estimate the chronological age based on the stages of third molar development following the eight stages (A to H) method of Demirjian et al. (along with two modifications-Orhan) and secondary aim is to compare third molar development with sex and age. Materials and Methods: The sample consisted of 115 orthopantomograms from South Indian subjects with known chronological age and gender. Multiple regression analysis was performed with chronological age as the dependable variable and third molar root development as independent variable. All the statistical analysis was performed using the SPSS 11.0 package (IBM ® Corporation). Results: Statistically no significant differences were found in third molar development between males and females. Depending on the available number of wisdom teeth in an individual, R2 varied for males from 0.21 to 0.48 and for females from 0.16 to 0.38. New equations were derived for estimating the chronological age. Conclusion: The chronological age of a South Indian individual between 14 and 22 years may be estimated based on the regression formulae. However, additional studies with a larger study population must be conducted to meet the need for population-based information on third molar development. PMID:26005306

  9. Adjustment of geochemical background by robust multivariate statistics

    USGS Publications Warehouse

    Zhou, D.

    1985-01-01

    Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.

  10. [Sanitation and racial inequality conditions in urban Brazil: an analysis focused on the indigenous population based on the 2010 Population Census].

    PubMed

    Raupp, Ludimila; Fávaro, Thatiana Regina; Cunha, Geraldo Marcelo; Santos, Ricardo Ventura

    2017-01-01

    The aims of this study were to analyze and describe the presence and infrastructure of basic sanitation in the urban areas of Brazil, contrasting indigenous with non-indigenous households. Methods: A cross-sectional study based on microdata from the 2010 Census was conducted. The analyses were based on descriptive statistics (prevalence) and the construction of multiple logistic regression models (adjusted by socioeconomic and demographic covariates). The odds ratios were estimated for the association between the explanatory variables (covariates) and the outcome variables (water supply, sewage, garbage collection, and adequate sanitation). The statistical significance level established was 5%. Among the analyzed services, sewage proved to be the most precarious. Regarding race or color, indigenous households presented the lowest rate of sanitary infrastructure in Urban Brazil. The adjusted regression showed that, in general, indigenous households were at a disadvantage when compared to other categories of race or color, especially in terms of the presence of garbage collection services. These inequalities were much more pronounced in the South and Southeastern regions. The analyses of this study not only confirm the profile of poor conditions and infrastructure of the basic sanitation of indigenous households in urban areas, but also demonstrate the persistence of inequalities associated with race or color in the country.

  11. Does emotion and its daily fluctuation correlate with depression? A cross-cultural analysis among six developing countries.

    PubMed

    Chan, Derwin K C; Zhang, Xin; Fung, Helene H; Hagger, Martin S

    2015-03-01

    Utilizing a World Health Organization (WHO) multi-national dataset, the present study examined the relationships between emotion, affective variability (i.e., the fluctuation of emotional status), and depression across six developing countries, including China (N=15,050); Ghana (N=5,573); India (N=12,198); Mexico (N=5,448); South Africa (N=4,227); and Russia (N=4,947). Using moderated logistic regression and hierarchical multiple regression, the effects of emotion, affective variability, culture, and their interactions on depression and depressive symptoms were examined when statistically controlling for a number of external factors (i.e., age, gender, marital status, education level, income, smoking, alcohol drinking, physical activity, sedentary behavior, and diet). The results revealed that negative emotion was a statistically significant predictor of depressive symptoms, but the strength of association was smaller in countries with a lower incidence of depression (i.e., China and Ghana). The association between negative affective variability and the risk of depression was higher in India and lower in Ghana. Findings suggested that culture not only was associated with the incidence of depression, but it could also moderate the effects of emotion and affective variability on depression or the experience of depressive symptoms. Copyright © 2014 Ministry of Health, Saudi Arabia. Published by Elsevier Ltd. All rights reserved.

  12. A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy).

    PubMed

    Lo Presti, Rossella; Barca, Emanuele; Passarella, Giuseppe

    2010-01-01

    Environmental time series are often affected by the "presence" of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the "most similar" monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.

  13. Multivariate analysis of cytokine profiles in pregnancy complications.

    PubMed

    Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali

    2018-03-01

    The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.

  14. Case study on prediction of remaining methane potential of landfilled municipal solid waste by statistical analysis of waste composition data.

    PubMed

    Sel, İlker; Çakmakcı, Mehmet; Özkaya, Bestamin; Suphi Altan, H

    2016-10-01

    Main objective of this study was to develop a statistical model for easier and faster Biochemical Methane Potential (BMP) prediction of landfilled municipal solid waste by analyzing waste composition of excavated samples from 12 sampling points and three waste depths representing different landfilling ages of closed and active sections of a sanitary landfill site located in İstanbul, Turkey. Results of Principal Component Analysis (PCA) were used as a decision support tool to evaluation and describe the waste composition variables. Four principal component were extracted describing 76% of data set variance. The most effective components were determined as PCB, PO, T, D, W, FM, moisture and BMP for the data set. Multiple Linear Regression (MLR) models were built by original compositional data and transformed data to determine differences. It was observed that even residual plots were better for transformed data the R(2) and Adjusted R(2) values were not improved significantly. The best preliminary BMP prediction models consisted of D, W, T and FM waste fractions for both versions of regressions. Adjusted R(2) values of the raw and transformed models were determined as 0.69 and 0.57, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Evaluation of nonpoint-source contamination, Wisconsin: water year 1999

    USGS Publications Warehouse

    Walker, John F.; Graczyk, D.J.; Corsi, Steven R.; Wierl, J.A.; Owens, D.W.

    2001-01-01

    For two of the eight rural streams (Rattlesnake and Kuenster Creeks) minimal BMP implementation has occurred, hence a comparison of pre- BMP and data collected after BMP implementation began is not warranted. For two other rural streams (Brewery and Garfoot Creeks), BMP implementation is complete. For the four remaining rural streams (Bower, Otter, Eagle, and Joos Valley Creeks), the pre-BMP load data were compared to the transitional data to determine if significant reductions in the loads have occurred as a result of the BMP implementation to date. For all sites, the actual constituent loads for suspended solids and total phosphorus exhibit no statistically significant reductions after BMP installation. Multiple regressions were used to remove some of the natural variability in the data. Based on the residual analysis, for Otter Creek, there is a significant difference in the suspended-solids regression residuals between the pre-BMP and transitional periods, indicating a potential reduction as a result of the BMP implementation after accounting for natural variability. For Joos Valley Creek, the residuals for suspended solids and total phosphorus both show a significant reduction after accounting for natural variability. It is possible that the other sites will also show statistically significant reductions in suspended solids and total phosphorus if additional BMPs are implemented.

  16. Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy.

    PubMed

    Huppert, Theodore J

    2016-01-01

    Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.

  17. Estimation of the residual bromine concentration after disinfection of cooling water by statistical evaluation.

    PubMed

    Megalopoulos, Fivos A; Ochsenkuehn-Petropoulou, Maria T

    2015-01-01

    A statistical model based on multiple linear regression is developed, to estimate the bromine residual that can be expected after the bromination of cooling water. Make-up water sampled from a power plant in the Greek territory was used for the creation of the various cooling water matrices under investigation. The amount of bromine fed to the circuit, as well as other important operational parameters such as concentration at the cooling tower, temperature, organic load and contact time are taken as the independent variables. It is found that the highest contribution to the model's predictive ability comes from cooling water's organic load concentration, followed by the amount of bromine fed to the circuit, the water's mean temperature, the duration of the bromination period and finally its conductivity. Comparison of the model results with the experimental data confirms its ability to predict residual bromine given specific bromination conditions.

  18. Complexities and potential pitfalls of clinical study design and data analysis in assisted reproduction.

    PubMed

    Patounakis, George; Hill, Micah J

    2018-06-01

    The purpose of the current review is to describe the common pitfalls in design and statistical analysis of reproductive medicine studies. It serves to guide both authors and reviewers toward reducing the incidence of spurious statistical results and erroneous conclusions. The large amount of data gathered in IVF cycles leads to problems with multiplicity, multicollinearity, and over fitting of regression models. Furthermore, the use of the word 'trend' to describe nonsignificant results has increased in recent years. Finally, methods to accurately account for female age in infertility research models are becoming more common and necessary. The pitfalls of study design and analysis reviewed provide a framework for authors and reviewers to approach clinical research in the field of reproductive medicine. By providing a more rigorous approach to study design and analysis, the literature in reproductive medicine will have more reliable conclusions that can stand the test of time.

  19. Aging, not menopause, is associated with higher prevalence of hyperuricemia among older women.

    PubMed

    Krishnan, Eswar; Bennett, Mihoko; Chen, Linjun

    2014-11-01

    This work aims to study the associations, if any, of hyperuricemia, gout, and menopause status in the US population. Using multiyear data from the National Health and Nutrition Examination Survey, we performed unmatched comparisons and one to three age-matched comparisons of women aged 20 to 70 years with and without hyperuricemia (serum urate ≥6 mg/dL). Analyses were performed using survey-weighted multiple logistic regression and conditional logistic regression, respectively. Overall, there were 1,477 women with hyperuricemia. Age and serum urate were significantly correlated. In unmatched analyses (n = 9,573 controls), postmenopausal women were older, were heavier, and had higher prevalence of renal impairment, hypertension, diabetes, and hyperlipidemia. In multivariable regression, after accounting for age, body mass index, glomerular filtration rate, and diuretic use, menopause was associated with hyperuricemia (odds ratio, 1.36; 95% CI, 1.05-1.76; P = 0.002). In corresponding multivariable regression using age-matched data (n = 4,431 controls), the odds ratio for menopause was 0.94 (95% CI, 0.83-1.06). Current use of hormone therapy was not associated with prevalent hyperuricemia in both unmatched and matched analyses. Age is a better statistical explanation for the higher prevalence of hyperuricemia among older women than menopause status.

  20. Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution.

    ERIC Educational Resources Information Center

    Kromrey, Jeffrey D.; Hines, Constance V.

    1995-01-01

    The accuracy of four empirical techniques to estimate shrinkage in multiple regression was studied through Monte Carlo simulation. None of the techniques provided unbiased estimates of the population squared multiple correlation coefficient, but the normalized jackknife and bootstrap techniques demonstrated marginally acceptable performance with…

Top