Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Regression Analysis with Dummy Variables: Use and Interpretation.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Oliver, J. Dale
1986-01-01
Multiple regression analysis (MRA) may be used when both continuous and categorical variables are included as independent research variables. The use of MRA with categorical variables involves dummy coding, that is, assigning zeros and ones to levels of categorical variables. Caution is urged in results interpretation. (Author/CH)
RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,
This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea
NASA Astrophysics Data System (ADS)
Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng
2011-11-01
SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
Regression Analysis by Example. 5th Edition
ERIC Educational Resources Information Center
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Correlative and multivariate analysis of increased radon concentration in underground laboratory.
Maletić, Dimitrije M; Udovičić, Vladimir I; Banjanac, Radomir M; Joković, Dejan R; Dragić, Aleksandar L; Veselinović, Nikola B; Filipović, Jelena
2014-11-01
The results of analysis using correlative and multivariate methods, as developed for data analysis in high-energy physics and implemented in the Toolkit for Multivariate Analysis software package, of the relations of the variation of increased radon concentration with climate variables in shallow underground laboratory is presented. Multivariate regression analysis identified a number of multivariate methods which can give a good evaluation of increased radon concentrations based on climate variables. The use of the multivariate regression methods will enable the investigation of the relations of specific climate variable with increased radon concentrations by analysis of regression methods resulting in 'mapped' underlying functional behaviour of radon concentrations depending on a wide spectrum of climate variables. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Bayesian Adaptive Lasso for Ordinal Regression with Latent Variables
ERIC Educational Resources Information Center
Feng, Xiang-Nan; Wu, Hao-Tian; Song, Xin-Yuan
2017-01-01
We consider an ordinal regression model with latent variables to investigate the effects of observable and latent explanatory variables on the ordinal responses of interest. Each latent variable is characterized by correlated observed variables through a confirmatory factor analysis model. We develop a Bayesian adaptive lasso procedure to conduct…
L.R. Grosenbaugh
1967-01-01
Describes an expansible computerized system that provides data needed in regression or covariance analysis of as many as 50 variables, 8 of which may be dependent. Alternatively, it can screen variously generated combinations of independent variables to find the regression with the smallest mean-squared-residual, which will be fitted if desired. The user can easily...
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
An improved strategy for regression of biophysical variables and Landsat ETM+ data.
Warren B. Cohen; Thomas K. Maiersperger; Stith T. Gower; David P. Turner
2003-01-01
Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent woody canopy cover, and leaf area index (LAI). Traditional methods of regression are not...
Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung
2012-07-01
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.
USAF (United States Air Force) Stability and Control DATCOM (Data Compendium)
1978-04-01
regression analysis involves the study of a group of variables to determine their effect on a given parameter. Because of the empirical nature of this...regression analysis of mathematical statistics. In general, a regression analysis involves the study of a group of variables to determine their effect on a...Excperiment, OSR TN 58-114, MIT Fluid Dynamics Research Group Rapt. 57-5, 1957. (U) 90. Kennet, H., and Ashley, H.: Review of Unsteady Aerodynamic Studies in
Kanada, Yoshikiyo; Sakurai, Hiroaki; Sugiura, Yoshito; Arai, Tomoaki; Koyama, Soichiro; Tanabe, Shigeo
2017-11-01
[Purpose] To create a regression formula in order to estimate 1RM for knee extensors, based on the maximal isometric muscle strength measured using a hand-held dynamometer and data regarding the body composition. [Subjects and Methods] Measurement was performed in 21 healthy males in their twenties to thirties. Single regression analysis was performed, with measurement values representing 1RM and the maximal isometric muscle strength as dependent and independent variables, respectively. Furthermore, multiple regression analysis was performed, with data regarding the body composition incorporated as another independent variable, in addition to the maximal isometric muscle strength. [Results] Through single regression analysis with the maximal isometric muscle strength as an independent variable, the following regression formula was created: 1RM (kg)=0.714 + 0.783 × maximal isometric muscle strength (kgf). On multiple regression analysis, only the total muscle mass was extracted. [Conclusion] A highly accurate regression formula to estimate 1RM was created based on both the maximal isometric muscle strength and body composition. Using a hand-held dynamometer and body composition analyzer, it was possible to measure these items in a short time, and obtain clinically useful results.
Variable Selection for Regression Models of Percentile Flows
NASA Astrophysics Data System (ADS)
Fouad, G.
2017-12-01
Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.
A Simulation Investigation of Principal Component Regression.
ERIC Educational Resources Information Center
Allen, David E.
Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…
ERIC Educational Resources Information Center
Jaccard, James; And Others
1990-01-01
Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)
NASA Astrophysics Data System (ADS)
Ceppi, C.; Mancini, F.; Ritrovato, G.
2009-04-01
This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
The process and utility of classification and regression tree methodology in nursing research
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-01-01
Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Biostatistics Series Module 6: Correlation and Linear Regression
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175
AN IMPROVED STRATEGY FOR REGRESSION OF BIOPHYSICAL VARIABLES AND LANDSAT ETM+ DATA. (R828309)
Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent wood...
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
Ryberg, Karen R.
2007-01-01
This report presents the results of a study by the U.S. Geological Survey, done in cooperation with the North Dakota State Water Commission, to estimate water-quality constituent concentrations at seven sites on the Sheyenne River, N. Dak. Regression analysis of water-quality data collected in 1980-2006 was used to estimate concentrations for hardness, dissolved solids, calcium, magnesium, sodium, and sulfate. The explanatory variables examined for the regression relations were continuously monitored streamflow, specific conductance, and water temperature. For the conditions observed in 1980-2006, streamflow was a significant explanatory variable for some constituents. Specific conductance was a significant explanatory variable for all of the constituents, and water temperature was not a statistically significant explanatory variable for any of the constituents in this study. The regression relations were evaluated using common measures of variability, including R2, the proportion of variability in the estimated constituent concentration explained by the explanatory variables and regression equation. R2 values ranged from 0.784 for calcium to 0.997 for dissolved solids. The regression relations also were evaluated by calculating the median relative percentage difference (RPD) between measured constituent concentration and the constituent concentration estimated by the regression equations. Median RPDs ranged from 1.7 for dissolved solids to 11.5 for sulfate. The regression relations also may be used to estimate daily constituent loads. The relations should be monitored for change over time, especially at sites 2 and 3 which have a short period of record. In addition, caution should be used when the Sheyenne River is affected by ice or when upstream sites are affected by isolated storm runoff. Almost all of the outliers and highly influential samples removed from the analysis were made during periods when the Sheyenne River might be affected by ice.
An Effect Size for Regression Predictors in Meta-Analysis
ERIC Educational Resources Information Center
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Troutman, Brent M.
1982-01-01
Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.
Brown, C. Erwin
1993-01-01
Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.
Iterative Strain-Gage Balance Calibration Data Analysis for Extended Independent Variable Sets
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred
2011-01-01
A new method was developed that makes it possible to use an extended set of independent calibration variables for an iterative analysis of wind tunnel strain gage balance calibration data. The new method permits the application of the iterative analysis method whenever the total number of balance loads and other independent calibration variables is greater than the total number of measured strain gage outputs. Iteration equations used by the iterative analysis method have the limitation that the number of independent and dependent variables must match. The new method circumvents this limitation. It simply adds a missing dependent variable to the original data set by using an additional independent variable also as an additional dependent variable. Then, the desired solution of the regression analysis problem can be obtained that fits each gage output as a function of both the original and additional independent calibration variables. The final regression coefficients can be converted to data reduction matrix coefficients because the missing dependent variables were added to the data set without changing the regression analysis result for each gage output. Therefore, the new method still supports the application of the two load iteration equation choices that the iterative method traditionally uses for the prediction of balance loads during a wind tunnel test. An example is discussed in the paper that illustrates the application of the new method to a realistic simulation of temperature dependent calibration data set of a six component balance.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
The use of cognitive ability measures as explanatory variables in regression analysis.
Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J
2012-12-01
Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
Gender effects in gaming research: a case for regression residuals?
Pfister, Roland
2011-10-01
Numerous recent studies have examined the impact of video gaming on various dependent variables, including the players' affective reactions, positive as well as detrimental cognitive effects, and real-world aggression. These target variables are typically analyzed as a function of game characteristics and player attributes-especially gender. However, findings on the uneven distribution of gaming experience between males and females, on the one hand, and the effect of gaming experience on several target variables, on the other hand, point at a possible confound when gaming experiments are analyzed with a standard analysis of variance. This study uses simulated data to exemplify analysis of regression residuals as a potentially beneficial data analysis strategy for such datasets. As the actual impact of gaming experience on each of the various dependent variables differs, the ultimate benefits of analysis of regression residuals entirely depend on the research question, but it offers a powerful statistical approach to video game research whenever gaming experience is a confounding factor.
Water quality parameter measurement using spectral signatures
NASA Technical Reports Server (NTRS)
White, P. E.
1973-01-01
Regression analysis is applied to the problem of measuring water quality parameters from remote sensing spectral signature data. The equations necessary to perform regression analysis are presented and methods of testing the strength and reliability of a regression are described. An efficient algorithm for selecting an optimal subset of the independent variables available for a regression is also presented.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
General Nature of Multicollinearity in Multiple Regression Analysis.
ERIC Educational Resources Information Center
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Quantile regression applied to spectral distance decay
Rocchini, D.; Cade, B.S.
2008-01-01
Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.
Spectral distance decay: Assessing species beta-diversity by quantile regression
Rocchinl, D.; Nagendra, H.; Ghate, R.; Cade, B.S.
2009-01-01
Remotely sensed data represents key information for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance may allow us to quantitatively estimate how beta-diversity in species changes with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological datasets are characterized by a high number of zeroes that can add noise to the regression model. Quantile regression can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this paper, we used ordinary least square (ols) and quantile regression to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.05) considering both ols and quantile regression. Nonetheless, ols regression estimate of mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when spectral distance approaches zero, was very low compared with the intercepts of upper quantiles, which detected high species similarity when habitats are more similar. In this paper we demonstrated the power of using quantile regressions applied to spectral distance decay in order to reveal species diversity patterns otherwise lost or underestimated by ordinary least square regression. ?? 2009 American Society for Photogrammetry and Remote Sensing.
The use of cognitive ability measures as explanatory variables in regression analysis
Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J
2015-01-01
Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual’s wage, or a decision such as an individual’s education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score, constructed via standard psychometric practice from individuals’ responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a “mixed effects structural equations” (MESE) model, may be more appropriate in many circumstances. PMID:26998417
Regression Analysis of Stage Variability for West-Central Florida Lakes
Sacks, Laura A.; Ellison, Donald L.; Swancar, Amy
2008-01-01
The variability in a lake's stage depends upon many factors, including surface-water flows, meteorological conditions, and hydrogeologic characteristics near the lake. An understanding of the factors controlling lake-stage variability for a population of lakes may be helpful to water managers who set regulatory levels for lakes. The goal of this study is to determine whether lake-stage variability can be predicted using multiple linear regression and readily available lake and basin characteristics defined for each lake. Regressions were evaluated for a recent 10-year period (1996-2005) and for a historical 10-year period (1954-63). Ground-water pumping is considered to have affected stage at many of the 98 lakes included in the recent period analysis, and not to have affected stage at the 20 lakes included in the historical period analysis. For the recent period, regression models had coefficients of determination (R2) values ranging from 0.60 to 0.74, and up to five explanatory variables. Standard errors ranged from 21 to 37 percent of the average stage variability. Net leakage was the most important explanatory variable in regressions describing the full range and low range in stage variability for the recent period. The most important explanatory variable in the model predicting the high range in stage variability was the height over median lake stage at which surface-water outflow would occur. Other explanatory variables in final regression models for the recent period included the range in annual rainfall for the period and several variables related to local and regional hydrogeology: (1) ground-water pumping within 1 mile of each lake, (2) the amount of ground-water inflow (by category), (3) the head gradient between the lake and the Upper Floridan aquifer, and (4) the thickness of the intermediate confining unit. Many of the variables in final regression models are related to hydrogeologic characteristics, underscoring the importance of ground-water exchange in controlling the stage of karst lakes in Florida. Regression equations were used to predict lake-stage variability for the recent period for 12 additional lakes, and the median difference between predicted and observed values ranged from 11 to 23 percent. Coefficients of determination for the historical period were considerably lower (maximum R2 of 0.28) than for the recent period. Reasons for these low R2 values are probably related to the small number of lakes (20) with stage data for an equivalent time period that were unaffected by ground-water pumping, the similarity of many of the lake types (large surface-water drainage lakes), and the greater uncertainty in defining historical basin characteristics. The lack of lake-stage data unaffected by ground-water pumping and the poor regression results obtained for that group of lakes limit the ability to predict natural lake-stage variability using this method in west-central Florida.
Categorical Variables in Multiple Regression: Some Cautions.
ERIC Educational Resources Information Center
O'Grady, Kevin E.; Medoff, Deborah R.
1988-01-01
Limitations of dummy coding and nonsense coding as methods of coding categorical variables for use as predictors in multiple regression analysis are discussed. The combination of these approaches often yields estimates and tests of significance that are not intended by researchers for inclusion in their models. (SLD)
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Prediction by regression and intrarange data scatter in surface-process studies
Toy, T.J.; Osterkamp, W.R.; Renard, K.G.
1993-01-01
Modeling is a major component of contemporary earth science, and regression analysis occupies a central position in the parameterization, calibration, and validation of geomorphic and hydrologic models. Although this methodology can be used in many ways, we are primarily concerned with the prediction of values for one variable from another variable. Examination of the literature reveals considerable inconsistency in the presentation of the results of regression analysis and the occurrence of patterns in the scatter of data points about the regression line. Both circumstances confound utilization and evaluation of the models. Statisticians are well aware of various problems associated with the use of regression analysis and offer improved practices; often, however, their guidelines are not followed. After a review of the aforementioned circumstances and until standard criteria for model evaluation become established, we recommend, as a minimum, inclusion of scatter diagrams, the standard error of the estimate, and sample size in reporting the results of regression analyses for most surface-process studies. ?? 1993 Springer-Verlag.
NASA Astrophysics Data System (ADS)
Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa
2011-08-01
In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.
A Latent-Variable Causal Model of Faculty Reputational Ratings.
ERIC Educational Resources Information Center
King, Suzanne; Wolfle, Lee M.
A reanalysis was conducted of Saunier's research (1985) on sources of variation in the National Research Council (NRC) reputational ratings of university faculty. Saunier conducted a stepwise regression analysis using 12 predictor variables. Due to problems with multicollinearity and because of the atheoretical nature of stepwise regression,…
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.
ERIC Educational Resources Information Center
Olson, Jeffery E.
Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…
Meta-regression analysis of commensal and pathogenic Escherichia coli survival in soil and water.
Franz, Eelco; Schijven, Jack; de Roda Husman, Ana Maria; Blaak, Hetty
2014-06-17
The extent to which pathogenic and commensal E. coli (respectively PEC and CEC) can survive, and which factors predominantly determine the rate of decline, are crucial issues from a public health point of view. The goal of this study was to provide a quantitative summary of the variability in E. coli survival in soil and water over a broad range of individual studies and to identify the most important sources of variability. To that end, a meta-regression analysis on available literature data was conducted. The considerable variation in reported decline rates indicated that the persistence of E. coli is not easily predictable. The meta-analysis demonstrated that for soil and water, the type of experiment (laboratory or field), the matrix subtype (type of water and soil), and temperature were the main factors included in the regression analysis. A higher average decline rate in soil of PEC compared with CEC was observed. The regression models explained at best 57% of the variation in decline rate in soil and 41% of the variation in decline rate in water. This indicates that additional factors, not included in the current meta-regression analysis, are of importance but rarely reported. More complete reporting of experimental conditions may allow future inference on the global effects of these variables on the decline rate of E. coli.
NASA Astrophysics Data System (ADS)
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Analysis of the labor productivity of enterprises via quantile regression
NASA Astrophysics Data System (ADS)
Türkan, Semra
2017-07-01
In this study, we have analyzed the factors that affect the performance of Turkey's Top 500 Industrial Enterprises using quantile regression. The variable about labor productivity of enterprises is considered as dependent variable, the variableabout assets is considered as independent variable. The distribution of labor productivity of enterprises is right-skewed. If the dependent distribution is skewed, linear regression could not catch important aspects of the relationships between the dependent variable and its predictors due to modeling only the conditional mean. Hence, the quantile regression, which allows modelingany quantilesof the dependent distribution, including the median,appears to be useful. It examines whether relationships between dependent and independent variables are different for low, medium, and high percentiles. As a result of analyzing data, the effect of total assets is relatively constant over the entire distribution, except the upper tail. It hasa moderately stronger effect in the upper tail.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
ERIC Educational Resources Information Center
Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar
2010-01-01
Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?
ERIC Educational Resources Information Center
Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.
2009-01-01
The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Cephalometric landmark detection in dental x-ray images using convolutional neural networks
NASA Astrophysics Data System (ADS)
Lee, Hansang; Park, Minseok; Kim, Junmo
2017-03-01
In dental X-ray images, an accurate detection of cephalometric landmarks plays an important role in clinical diagnosis, treatment and surgical decisions for dental problems. In this work, we propose an end-to-end deep learning system for cephalometric landmark detection in dental X-ray images, using convolutional neural networks (CNN). For detecting 19 cephalometric landmarks in dental X-ray images, we develop a detection system using CNN-based coordinate-wise regression systems. By viewing x- and y-coordinates of all landmarks as 38 independent variables, multiple CNN-based regression systems are constructed to predict the coordinate variables from input X-ray images. First, each coordinate variable is normalized by the length of either height or width of an image. For each normalized coordinate variable, a CNN-based regression system is trained on training images and corresponding coordinate variable, which is a variable to be regressed. We train 38 regression systems with the same CNN structure on coordinate variables, respectively. Finally, we compute 38 coordinate variables with these trained systems from unseen images and extract 19 landmarks by pairing the regressed coordinates. In experiments, the public database from the Grand Challenges in Dental X-ray Image Analysis in ISBI 2015 was used and the proposed system showed promising performance by successfully locating the cephalometric landmarks within considerable margins from the ground truths.
Parental education predicts change in intelligence quotient after childhood epilepsy surgery.
Meekes, Joost; van Schooneveld, Monique M J; Braams, Olga B; Jennekens-Schinkel, Aag; van Rijen, Peter C; Hendriks, Marc P H; Braun, Kees P J; van Nieuwenhuizen, Onno
2015-04-01
To know whether change in the intelligence quotient (IQ) of children who undergo epilepsy surgery is associated with the educational level of their parents. Retrospective analysis of data obtained from a cohort of children who underwent epilepsy surgery between January 1996 and September 2010. We performed simple and multiple regression analyses to identify predictors associated with IQ change after surgery. In addition to parental education, six variables previously demonstrated to be associated with IQ change after surgery were included as predictors: age at surgery, duration of epilepsy, etiology, presurgical IQ, reduction of antiepileptic drugs, and seizure freedom. We used delta IQ (IQ 2 years after surgery minus IQ shortly before surgery) as the primary outcome variable, but also performed analyses with pre- and postsurgical IQ as outcome variables to support our findings. To validate the results we performed simple regression analysis with parental education as the predictor in specific subgroups. The sample for regression analysis included 118 children (60 male; median age at surgery 9.73 years). Parental education was significantly associated with delta IQ in simple regression analysis (p = 0.004), and also contributed significantly to postsurgical IQ in multiple regression analysis (p = 0.008). Additional analyses demonstrated that parental education made a unique contribution to prediction of delta IQ, that is, it could not be replaced by the illness-related variables. Subgroup analyses confirmed the association of parental education with IQ change after surgery for most groups. Children whose parents had higher education demonstrate on average a greater increase in IQ after surgery and a higher postsurgical--but not presurgical--IQ than children whose parents completed at most lower secondary education. Parental education--and perhaps other environmental variables--should be considered in the prognosis of cognitive function after childhood epilepsy surgery. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.
Effect of Contact Damage on the Strength of Ceramic Materials.
1982-10-01
variables that are important to erosion, and a multivariate , linear regression analysis is used to fit the data to the dimensional analysis. The...of Equations 7 and 8 by a multivariable regression analysis (room tem- perature data) Exponent Regression Standard error Computed coefficient of...1980) 593. WEAVER, Proc. Brit. Ceram. Soc. 22 (1973) 125. 39. P. W. BRIDGMAN, "Dimensional Analaysis ", (Yale 18. R. W. RICE, S. W. FREIMAN and P. F
Poisson Regression Analysis of Illness and Injury Surveillance Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences duemore » to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra-Poisson variation. The R open source software environment for statistical computing and graphics is used for analysis. Additional details about R and the data that were used in this report are provided in an Appendix. Information on how to obtain R and utility functions that can be used to duplicate results in this report are provided.« less
Comparison of methods for the analysis of relatively simple mediation models.
Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W
2017-09-01
Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.
Multicollinearity and Regression Analysis
NASA Astrophysics Data System (ADS)
Daoud, Jamal I.
2017-12-01
In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic analysis, engine cycle analysis, propulsion data interpolation, mission performance, airfield length for landing and takeoff, noise footprint, and others.
Predictive equations for the estimation of body size in seals and sea lions (Carnivora: Pinnipedia)
Churchill, Morgan; Clementz, Mark T; Kohno, Naoki
2014-01-01
Body size plays an important role in pinniped ecology and life history. However, body size data is often absent for historical, archaeological, and fossil specimens. To estimate the body size of pinnipeds (seals, sea lions, and walruses) for today and the past, we used 14 commonly preserved cranial measurements to develop sets of single variable and multivariate predictive equations for pinniped body mass and total length. Principal components analysis (PCA) was used to test whether separate family specific regressions were more appropriate than single predictive equations for Pinnipedia. The influence of phylogeny was tested with phylogenetic independent contrasts (PIC). The accuracy of these regressions was then assessed using a combination of coefficient of determination, percent prediction error, and standard error of estimation. Three different methods of multivariate analysis were examined: bidirectional stepwise model selection using Akaike information criteria; all-subsets model selection using Bayesian information criteria (BIC); and partial least squares regression. The PCA showed clear discrimination between Otariidae (fur seals and sea lions) and Phocidae (earless seals) for the 14 measurements, indicating the need for family-specific regression equations. The PIC analysis found that phylogeny had a minor influence on relationship between morphological variables and body size. The regressions for total length were more accurate than those for body mass, and equations specific to Otariidae were more accurate than those for Phocidae. Of the three multivariate methods, the all-subsets approach required the fewest number of variables to estimate body size accurately. We then used the single variable predictive equations and the all-subsets approach to estimate the body size of two recently extinct pinniped taxa, the Caribbean monk seal (Monachus tropicalis) and the Japanese sea lion (Zalophus japonicus). Body size estimates using single variable regressions generally under or over-estimated body size; however, the all-subset regression produced body size estimates that were close to historically recorded body length for these two species. This indicates that the all-subset regression equations developed in this study can estimate body size accurately. PMID:24916814
Difficulties with Regression Analysis of Age-Adjusted Rates.
1982-09-01
variables used in those analyses, such as death rates in various states, have been age adjusted, whereas the predictor variables have not been age adjusted...The use of crude state death rates as the outcome variable with crude covariates and age as predictors can avoid the problem, at least under some...should be regressed on age-adjusted exposure Z+B+ Although age-specific death rates , Yas+’ may be available, it is often difficult to obtain age
MULGRES: a computer program for stepwise multiple regression analysis
A. Jeff Martin
1971-01-01
MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Influences on Academic Achievement Across High and Low Income Countries: A Re-Analysis of IEA Data.
ERIC Educational Resources Information Center
Heyneman, S.; Loxley, W.
Previous international studies of science achievement put the data through a process of winnowing to decide which variables to keep in the final regressions. Variables were allowed to enter the final regressions if they met a minimum beta coefficient criterion of 0.05 averaged across rich and poor countries alike. The criterion was an average…
Lawrence, Stephen J.
2012-01-01
Regression analyses show that E. coli density in samples was strongly related to turbidity, streamflow characteristics, and season at both sites. The regression equation chosen for the Norcross data showed that 78 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), streamflow event (dry-weather flow or stormflow), season (cool or warm), and an interaction term that is the cross product of streamflow event and turbidity. The regression equation chosen for the Atlanta data showed that 76 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), water temperature, streamflow event, and an interaction term that is the cross product of streamflow event and turbidity. Residual analysis and model confirmation using new data indicated the regression equations selected at both sites predicted E. coli density within the 90 percent prediction intervals of the equations and could be used to predict E. coli density in real time at both sites.
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
NASA Astrophysics Data System (ADS)
Kristoufek, Ladislav
2015-02-01
We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.
NASA Astrophysics Data System (ADS)
Koshigai, Masaru; Marui, Atsunao
Water table provides important information for the evaluation of groundwater resource. Recently, the estimation of water table in wide area is required for effective evaluation of groundwater resources. However, evaluation process is met with difficulties due to technical and economic constraints. Regression analysis for the prediction of groundwater levels based on geomorphologic and geologic conditions is considered as a reliable tool for the estimation of water table of wide area. Data of groundwater levels were extracted from the public database of geotechnical information. It was observed that changes in groundwater level depend on climate conditions. It was also observed and confirmed that there exist variations of groundwater levels according to geomorphologic and geologic conditions. The objective variable of the regression analysis was groundwater level. And the explanatory variables were elevation and the dummy variable consisting of group number. The constructed regression formula was significant according to the determination coefficients and analysis of the variance. Therefore, combining the regression formula and mesh map, the statistical method to estimate the water table based on geomorphologic and geologic condition for the whole country could be established.
Independent contrasts and PGLS regression estimators are equivalent.
Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary
2012-05-01
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
Hu, Wenbiao; Tong, Shilu; Mengersen, Kerrie; Connell, Des
2007-09-01
Few studies have examined the relationship between weather variables and cryptosporidiosis in Australia. This paper examines the potential impact of weather variability on the transmission of cryptosporidiosis and explores the possibility of developing an empirical forecast system. Data on weather variables, notified cryptosporidiosis cases, and population size in Brisbane were supplied by the Australian Bureau of Meteorology, Queensland Department of Health, and Australian Bureau of Statistics for the period of January 1, 1996-December 31, 2004, respectively. Time series Poisson regression and seasonal auto-regression integrated moving average (SARIMA) models were performed to examine the potential impact of weather variability on the transmission of cryptosporidiosis. Both the time series Poisson regression and SARIMA models show that seasonal and monthly maximum temperature at a prior moving average of 1 and 3 months were significantly associated with cryptosporidiosis disease. It suggests that there may be 50 more cases a year for an increase of 1 degrees C maximum temperature on average in Brisbane. Model assessments indicated that the SARIMA model had better predictive ability than the Poisson regression model (SARIMA: root mean square error (RMSE): 0.40, Akaike information criterion (AIC): -12.53; Poisson regression: RMSE: 0.54, AIC: -2.84). Furthermore, the analysis of residuals shows that the time series Poisson regression appeared to violate a modeling assumption, in that residual autocorrelation persisted. The results of this study suggest that weather variability (particularly maximum temperature) may have played a significant role in the transmission of cryptosporidiosis. A SARIMA model may be a better predictive model than a Poisson regression model in the assessment of the relationship between weather variability and the incidence of cryptosporidiosis.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results
ERIC Educational Resources Information Center
Warne, Russell T.
2011-01-01
Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…
NASA Astrophysics Data System (ADS)
Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.
2013-06-01
This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
ERIC Educational Resources Information Center
Campbell, S. Duke; Greenberg, Barry
The development of a predictive equation capable of explaining a significant percentage of enrollment variability at Florida International University is described. A model utilizing trend analysis and a multiple regression approach to enrollment forecasting was adapted to investigate enrollment dynamics at the university. Four independent…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Ito, Yukiko; Hattori, Reiko; Mase, Hiroki; Watanabe, Masako; Shiotani, Itaru
2008-12-01
Pollen information is indispensable for allergic individuals and clinicians. This study aimed to develop forecasting models for the total annual count of airborne pollen grains based on data monitored over the last 20 years at the Mie Chuo Medical Center, Tsu, Mie, Japan. Airborne pollen grains were collected using a Durham sampler. Total annual pollen count and pollen count from October to December (OD pollen count) of the previous year were transformed to logarithms. Regression analysis of the total pollen count was performed using variables such as the OD pollen count and the maximum temperature for mid-July of the previous year. Time series analysis revealed an alternate rhythm of the series of total pollen count. The alternate rhythm consisted of a cyclic alternation of an "on" year (high pollen count) and an "off" year (low pollen count). This rhythm was used as a dummy variable in regression equations. Of the three models involving the OD pollen count, a multiple regression equation that included the alternate rhythm variable and the interaction of this rhythm with OD pollen count showed a high coefficient of determination (0.844). Of the three models involving the maximum temperature for mid-July, those including the alternate rhythm variable and the interaction of this rhythm with maximum temperature had the highest coefficient of determination (0.925). An alternate pollen dispersal rhythm represented by a dummy variable in the multiple regression analysis plays a key role in improving forecasting models for the total annual sugi pollen count.
Smith, David V.; Utevsky, Amanda V.; Bland, Amy R.; Clement, Nathan; Clithero, John A.; Harsch, Anne E. W.; Carter, R. McKell; Huettel, Scott A.
2014-01-01
A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent components analysis (ICA). We estimated voxelwise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust—yet frequently ignored—neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. PMID:24662574
Prediction of performance on the RCMP physical ability requirement evaluation.
Stanish, H I; Wood, T M; Campagna, P
1999-08-01
The Royal Canadian Mounted Police use the Physical Ability Requirement Evaluation (PARE) for screening applicants. The purposes of this investigation were to identify those field tests of physical fitness that were associated with PARE performance and determine which most accurately classified successful and unsuccessful PARE performers. The participants were 27 female and 21 male volunteers. Testing included measures of aerobic power, anaerobic power, agility, muscular strength, muscular endurance, and body composition. Multiple regression analysis revealed a three-variable model for males (70-lb bench press, standing long jump, and agility) explaining 79% of the variability in PARE time, whereas a one-variable model (agility) explained 43% of the variability for females. Analysis of the classification accuracy of the males' data was prohibited because 91% of the males passed the PARE. Classification accuracy of the females' data, using logistic regression, produced a two-variable model (agility, 1.5-mile endurance run) with 93% overall classification accuracy.
Choe, Jee-Hwan; Choi, Mi-Hee; Rhee, Min-Suk; Kim, Byoung-Chul
2016-01-01
This study investigated the degree to which instrumental measurements explain the variation in pork loin tenderness as assessed by the sensory evaluation of trained panelists. Warner-Bratzler shear force (WBS) had a significant relationship with the sensory tenderness variables, such as softness, initial tenderness, chewiness, and rate of breakdown. In a regression analysis, WBS could account variations in these sensory variables, though only to a limited proportion of variation. On the other hand, three parameters from texture profile analysis (TPA)—hardness, gumminess, and chewiness—were significantly correlated with all sensory evaluation variables. In particular, from the result of stepwise regression analysis, TPA hardness alone explained over 15% of variation in all sensory evaluation variables, with the exception of perceptible residue. Based on these results, TPA analysis was found to be better than WBS measurement, with the TPA parameter hardness likely to prove particularly useful, in terms of predicting pork loin tenderness as rated by trained panelists. However, sensory evaluation should be conducted to investigate practical pork tenderness perceived by consumer, because both instrumental measurements could explain only a small portion (less than 20%) of the variability in sensory evaluation. PMID:26954174
Choe, Jee-Hwan; Choi, Mi-Hee; Rhee, Min-Suk; Kim, Byoung-Chul
2016-07-01
This study investigated the degree to which instrumental measurements explain the variation in pork loin tenderness as assessed by the sensory evaluation of trained panelists. Warner-Bratzler shear force (WBS) had a significant relationship with the sensory tenderness variables, such as softness, initial tenderness, chewiness, and rate of breakdown. In a regression analysis, WBS could account variations in these sensory variables, though only to a limited proportion of variation. On the other hand, three parameters from texture profile analysis (TPA)-hardness, gumminess, and chewiness-were significantly correlated with all sensory evaluation variables. In particular, from the result of stepwise regression analysis, TPA hardness alone explained over 15% of variation in all sensory evaluation variables, with the exception of perceptible residue. Based on these results, TPA analysis was found to be better than WBS measurement, with the TPA parameter hardness likely to prove particularly useful, in terms of predicting pork loin tenderness as rated by trained panelists. However, sensory evaluation should be conducted to investigate practical pork tenderness perceived by consumer, because both instrumental measurements could explain only a small portion (less than 20%) of the variability in sensory evaluation.
Ryberg, Karen R.
2006-01-01
This report presents the results of a study by the U.S. Geological Survey, done in cooperation with the Bureau of Reclamation, U.S. Department of the Interior, to estimate water-quality constituent concentrations in the Red River of the North at Fargo, North Dakota. Regression analysis of water-quality data collected in 2003-05 was used to estimate concentrations and loads for alkalinity, dissolved solids, sulfate, chloride, total nitrite plus nitrate, total nitrogen, total phosphorus, and suspended sediment. The explanatory variables examined for regression relation were continuously monitored physical properties of water-streamflow, specific conductance, pH, water temperature, turbidity, and dissolved oxygen. For the conditions observed in 2003-05, streamflow was a significant explanatory variable for all estimated constituents except dissolved solids. pH, water temperature, and dissolved oxygen were not statistically significant explanatory variables for any of the constituents in this study. Specific conductance was a significant explanatory variable for alkalinity, dissolved solids, sulfate, and chloride. Turbidity was a significant explanatory variable for total phosphorus and suspended sediment. For the nutrients, total nitrite plus nitrate, total nitrogen, and total phosphorus, cosine and sine functions of time also were used to explain the seasonality in constituent concentrations. The regression equations were evaluated using common measures of variability, including R2, or the proportion of variability in the estimated constituent explained by the regression equation. R2 values ranged from 0.703 for total nitrogen concentration to 0.990 for dissolved-solids concentration. The regression equations also were evaluated by calculating the median relative percentage difference (RPD) between measured constituent concentration and the constituent concentration estimated by the regression equations. Median RPDs ranged from 1.1 for dissolved solids to 35.2 for total nitrite plus nitrate. Regression equations also were used to estimate daily constituent loads. Load estimates can be used by water-quality managers for comparison of current water-quality conditions to water-quality standards expressed as total maximum daily loads (TMDLs). TMDLs are a measure of the maximum amount of chemical constituents that a water body can receive and still meet established water-quality standards. The peak loads generally occurred in June and July when streamflow also peaked.
Kim, Seong-Gil
2018-01-01
Background The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. Material/Methods This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. Results In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). Conclusions Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement. PMID:29760375
Kim, Seong-Gil; Kim, Wan-Soo
2018-05-15
BACKGROUND The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. MATERIAL AND METHODS This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. RESULTS In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). CONCLUSIONS Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement.
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Determinants of Crime in Virginia: An Empirical Analysis
ERIC Educational Resources Information Center
Ali, Abdiweli M.; Peek, Willam
2009-01-01
This paper is an empirical analysis of the determinants of crime in Virginia. Over a dozen explanatory variables that current literature suggests as important determinants of crime are collected. The data is from 1970 to 2000. These include economic, fiscal, demographic, political, and social variables. The regression results indicate that crime…
Determining Directional Dependency in Causal Associations
Pornprasertmanit, Sunthud; Little, Todd D.
2014-01-01
Directional dependency is a method to determine the likely causal direction of effect between two variables. This article aims to critique and improve upon the use of directional dependency as a technique to infer causal associations. We comment on several issues raised by von Eye and DeShon (2012), including: encouraging the use of the signs of skewness and excessive kurtosis of both variables, discouraging the use of D’Agostino’s K2, and encouraging the use of directional dependency to compare variables only within time points. We offer improved steps for determining directional dependency that fix the problems we note. Next, we discuss how to integrate directional dependency into longitudinal data analysis with two variables. We also examine the accuracy of directional dependency evaluations when several regression assumptions are violated. Directional dependency can suggest the direction of a relation if (a) the regression error in population is normal, (b) an unobserved explanatory variable correlates with any variables equal to or less than .2, (c) a curvilinear relation between both variables is not strong (standardized regression coefficient ≤ .2), (d) there are no bivariate outliers, and (e) both variables are continuous. PMID:24683282
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
ERIC Educational Resources Information Center
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
ERIC Educational Resources Information Center
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
Air Leakage of US Homes: Regression Analysis and Improvements from Retrofit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Wanyu R.; Joh, Jeffrey; Sherman, Max H.
2012-08-01
LBNL Residential Diagnostics Database (ResDB) contains blower door measurements and other diagnostic test results of homes in United States. Of these, approximately 134,000 single-family detached homes have sufficient information for the analysis of air leakage in relation to a number of housing characteristics. We performed regression analysis to consider the correlation between normalized leakage and a number of explanatory variables: IECC climate zone, floor area, height, year built, foundation type, duct location, and other characteristics. The regression model explains 68% of the observed variability in normalized leakage. ResDB also contains the before and after retrofit air leakage measurements of approximatelymore » 23,000 homes that participated in weatherization assistant programs (WAPs) or residential energy efficiency programs. The two types of programs achieve rather similar reductions in normalized leakage: 30% for WAPs and 20% for other energy programs.« less
The ice age cycle and the deglaciations: an application of nonlinear regression modelling
NASA Astrophysics Data System (ADS)
Dalgleish, A. N.; Boulton, G. S.; Renshaw, E.
2000-03-01
We have applied the nonlinear regression technique known as additivity and variance stabilisation (AVAS) to time series which reflect Earth's climate over the last 600 ka. AVAS estimates a smooth, nonlinear transform for each variable, under the assumption of an additive model. The Earth's orbital parameters and insolation variations have been used as regression variables. Analysis of the contribution of each variable shows that the deglaciations are characterised by periods of increasing obliquity and perihelion approaching the vernal equinox, but not by any systematic change in eccentricity. The magnitude of insolation changes also plays no role. By approximating the transforms we can obtain a future prediction, with a glacial maximum at 60 ka AP, and a subsequent obliquity and precession forced deglaciation.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Regression Analysis of Optical Coherence Tomography Disc Variables for Glaucoma Diagnosis.
Richter, Grace M; Zhang, Xinbo; Tan, Ou; Francis, Brian A; Chopra, Vikas; Greenfield, David S; Varma, Rohit; Schuman, Joel S; Huang, David
2016-08-01
To report diagnostic accuracy of optical coherence tomography (OCT) disc variables using both time-domain (TD) and Fourier-domain (FD) OCT, and to improve the use of OCT disc variable measurements for glaucoma diagnosis through regression analyses that adjust for optic disc size and axial length-based magnification error. Observational, cross-sectional. In total, 180 normal eyes of 112 participants and 180 eyes of 138 participants with perimetric glaucoma from the Advanced Imaging for Glaucoma Study. Diagnostic variables evaluated from TD-OCT and FD-OCT were: disc area, rim area, rim volume, optic nerve head volume, vertical cup-to-disc ratio (CDR), and horizontal CDR. These were compared with overall retinal nerve fiber layer thickness and ganglion cell complex. Regression analyses were performed that corrected for optic disc size and axial length. Area-under-receiver-operating curves (AUROC) were used to assess diagnostic accuracy before and after the adjustments. An index based on multiple logistic regression that combined optic disc variables with axial length was also explored with the aim of improving diagnostic accuracy of disc variables. Comparison of diagnostic accuracy of disc variables, as measured by AUROC. The unadjusted disc variables with the highest diagnostic accuracies were: rim volume for TD-OCT (AUROC=0.864) and vertical CDR (AUROC=0.874) for FD-OCT. Magnification correction significantly worsened diagnostic accuracy for rim variables, and while optic disc size adjustments partially restored diagnostic accuracy, the adjusted AUROCs were still lower. Axial length adjustments to disc variables in the form of multiple logistic regression indices led to a slight but insignificant improvement in diagnostic accuracy. Our various regression approaches were not able to significantly improve disc-based OCT glaucoma diagnosis. However, disc rim area and vertical CDR had very high diagnostic accuracy, and these disc variables can serve to complement additional OCT measurements for diagnosis of glaucoma.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-01-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037
Spatial regression analysis on 32 years of total column ozone data
NASA Astrophysics Data System (ADS)
Knibbe, J. S.; van der A, R. J.; de Laat, A. T. J.
2014-08-01
Multiple-regression analyses have been performed on 32 years of total ozone column data that was spatially gridded with a 1 × 1.5° resolution. The total ozone data consist of the MSR (Multi Sensor Reanalysis; 1979-2008) and 2 years of assimilated SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY) ozone data (2009-2010). The two-dimensionality in this data set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on nonseasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO), El Niño-Southern Oscillation (ENSO) and stratospheric alternative halogens which are parameterized by the effective equivalent stratospheric chlorine (EESC). For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of a similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at mid- and high latitudes, the solar cycle affects ozone positively mostly in the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high northern latitudes, the effect of QBO is positive and negative in the tropics and mid- to high latitudes, respectively, and ENSO affects ozone negatively between 30° N and 30° S, particularly over the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally large at mid- to high latitudes. We observe ozone increases with potential vorticity and day length and ozone decreases with geopotential height and variable ozone effects due to the polar vortex in regions to the north and south of the polar vortices. Recovery of ozone is identified globally. However, recovery rates and uncertainties strongly depend on choices that can be made in defining the explanatory variables. The application of several trend models, each with their own pros and cons, yields a large range of recovery rate estimates. Overall these results suggest that care has to be taken in determining ozone recovery rates, in particular for the Antarctic ozone hole.
Vocational Teacher Stress and the Educational System.
ERIC Educational Resources Information Center
Adams, Elaine; Heath-Camp, Betty; Camp, William G.
1999-01-01
A multiple regression analysis of data from 235 secondary vocational teachers in Virginia found that educational system-related variables explained most teacher stress. The most important explanatory variables were task stress and role overload. (SK)
2015-01-01
different PRBC transfusion volumes. We performed multivariate regression analysis using HRV metrics and routine vital signs to test the hypothesis that...study sponsors did not have any role in the study design, data collection, analysis and interpretation of data, report writing, or the decision to...primary outcome was hemorrhagic injury plus different PRBC transfusion volumes. We performed multivariate regression analysis using HRV metrics and
Flora, David B.; LaBrish, Cathy; Chalmers, R. Philip
2011-01-01
We provide a basic review of the data screening and assumption testing issues relevant to exploratory and confirmatory factor analysis along with practical advice for conducting analyses that are sensitive to these concerns. Historically, factor analysis was developed for explaining the relationships among many continuous test scores, which led to the expression of the common factor model as a multivariate linear regression model with observed, continuous variables serving as dependent variables, and unobserved factors as the independent, explanatory variables. Thus, we begin our paper with a review of the assumptions for the common factor model and data screening issues as they pertain to the factor analysis of continuous observed variables. In particular, we describe how principles from regression diagnostics also apply to factor analysis. Next, because modern applications of factor analysis frequently involve the analysis of the individual items from a single test or questionnaire, an important focus of this paper is the factor analysis of items. Although the traditional linear factor model is well-suited to the analysis of continuously distributed variables, commonly used item types, including Likert-type items, almost always produce dichotomous or ordered categorical variables. We describe how relationships among such items are often not well described by product-moment correlations, which has clear ramifications for the traditional linear factor analysis. An alternative, non-linear factor analysis using polychoric correlations has become more readily available to applied researchers and thus more popular. Consequently, we also review the assumptions and data-screening issues involved in this method. Throughout the paper, we demonstrate these procedures using an historic data set of nine cognitive ability variables. PMID:22403561
Smith, David V; Utevsky, Amanda V; Bland, Amy R; Clement, Nathan; Clithero, John A; Harsch, Anne E W; McKell Carter, R; Huettel, Scott A
2014-07-15
A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent component analysis (ICA). We estimated voxel-wise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust-yet frequently ignored-neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. Copyright © 2014 Elsevier Inc. All rights reserved.
Age estimation using pulp/tooth area ratio in maxillary canines-A digital image analysis.
Juneja, Manjushree; Devi, Yashoda B K; Rakesh, N; Juneja, Saurabh
2014-09-01
Determination of age of a subject is one of the most important aspects of medico-legal cases and anthropological research. Radiographs can be used to indirectly measure the rate of secondary dentine deposition which is depicted by reduction in the pulp area. In this study, 200 patients of Karnataka aged between 18-72 years were selected for the study. Panoramic radiographs were made and indirectly digitized. Radiographic images of maxillary canines (RIC) were processed using a computer-aided drafting program (ImageJ). The variables pulp/root length (p), pulp/tooth length (r), pulp/root width at enamel-cementum junction (ECJ) level (a), pulp/root width at mid-root level (c), pulp/root width at midpoint level between ECJ level and mid-root level (b) and pulp/tooth area ratio (AR) were recorded. All the morphological variables including gender were statistically analyzed to derive regression equation for estimation of age. It was observed that 2 variables 'AR' and 'b' contributed significantly to the fit and were included in the regression model, yielding the formula: Age = 87.305-480.455(AR)+48.108(b). Statistical analysis indicated that the regression equation with selected variables explained 96% of total variance with the median of the residuals of 0.1614 years and standard error of estimate of 3.0186 years. There is significant correlation between age and morphological variables 'AR' and 'b' and the derived population specific regression equation can be potentially used for estimation of chronological age of individuals of Karnataka origin.
ERIC Educational Resources Information Center
Galaz-Fontes, Jesus Francisco; Gil-Anton, Manuel
This study examined overall job satisfaction among college faculty in Mexico. The study used data from a 1992-93 Carnegie International Faculty Survey. Secondary multiple regression analysis identified predictor variables for several faculty subgroups. Results were interpreted by differentiating between work-related and intrinsic factors, as well…
The contextual effects of social capital on health: a cross-national instrumental variable analysis.
Kim, Daniel; Baum, Christopher F; Ganz, Michael L; Subramanian, S V; Kawachi, Ichiro
2011-12-01
Past research on the associations between area-level/contextual social capital and health has produced conflicting evidence. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167,344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in both sexes when country population density and corruption were used as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Previous findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within and across countries may be large. Copyright © 2011 Elsevier Ltd. All rights reserved.
The contextual effects of social capital on health: a cross-national instrumental variable analysis
Kim, Daniel; Baum, Christopher F; Ganz, Michael; Subramanian, S V; Kawachi, Ichiro
2011-01-01
Past observational studies of the associations of area-level/contextual social capital with health have revealed conflicting findings. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167 344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in women and men using country population density and corruption as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Past findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within countries may be large. PMID:22078106
Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A
2015-03-15
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
A Computer Program for Preliminary Data Analysis
Dennis L. Schweitzer
1967-01-01
ABSTRACT. -- A computer program written in FORTRAN has been designed to summarize data. Class frequencies, means, and standard deviations are printed for as many as 100 independent variables. Cross-classifications of an observed dependent variable and of a dependent variable predicted by a multiple regression equation can also be generated.
NASA Astrophysics Data System (ADS)
Maguen, Ezra I.; Papaioannou, Thanassis; Nesburn, Anthony B.; Salz, James J.; Warren, Cathy; Grundfest, Warren S.
1996-05-01
Multivariable regression analysis was used to evaluate the combined effects of some preoperative and operative variables on the change of refraction following excimer laser photorefractive keratectomy for myopia (PRK). This analysis was performed on 152 eyes (at 6 months postoperatively) and 156 eyes (at 12 months postoperatively). The following variables were considered: intended refractive correction, patient age, treatment zone, central corneal thickness, average corneal curvature, and intraocular pressure. At 6 months after surgery, the cumulative R2 was 0.43 with 0.38 attributed to the intended correction and 0.06 attributed to the preoperative corneal curvature. At 12 months, the cumulative R2 was 0.37 where 0.33 was attributed to the intended correction, 0.02 to the preoperative corneal curvature, and 0.01 to both preoperative corneal thickness and to the patient age. Further model augmentation is necessary to account for the remaining variability and the behavior of the residuals.
Oliver, A; Mendizabal, J A; Ripoll, G; Albertí, P; Purroy, A
2010-04-01
The SEUROP system is currently in use for carcass classification in Europe. Image analysis and other new technologies are being developed to enhance and supplement this classification system. After slaughtering, 91 carcasses of local Spanish beef breeds were weighed and classified according to the SEUROP system. Two digital photographs (a side and a dorsal view) were taken of the left carcass sides, and a total of 33 morphometric measurements (lengths, perimeters, areas) were made. Commercial butchering of these carcasses took place 24 h postmortem, and the different cuts were grouped according to four commercial meat cut quality categories: extra, first, second, and third. Multiple regression analysis of carcass weight and the SEUROP conformation score (x variables) on meat yield and the four commercial cut quality category yields (y variables) was performed as a measure of the accuracy of the SEUROP system. Stepwise regression analysis of carcass weight and the 33 morphometric image analysis measurements (x variables) and meat yield and yields of the four commercial cut quality categories (y variables) was carried out. Higher accuracy was achieved using image analysis than using only the current SEUROP conformation score. The regression coefficient values were between R(2)=0.66 and R(2)=0.93 (P<0.001) for the SEUROP system and between R(2)=0.81 and R(2)=0.94 (P<0.001) for the image analysis method. These results suggest that the image analysis method should be helpful as a means of supplementing and enhancing the SEUROP system for grading beef carcasses. 2009 Elsevier Ltd. All rights reserved.
Rębacz-Maron, Ewa; Parafiniuk, Mirosław
2014-01-01
The aim of this paper was to examine the extent to which socioeconomic factors, anthropological data and somatic indices influenced the results of spirometric measurements (FEV1 and FVC) in Tanzanian youth. The population studied were young black Bantu men aged 12.8-24.0 years. Analysis was performed for the whole data set (n = 255), as well as separately for two age groups: under 17.5 years (n = 168) and 17.5 + (n = 87). A backward stepwise multiple regression analysis was performed for FEV1 and FVC as dependent variables on socioeconomic and anthropometric data. Multiple regression analysis for the whole group revealed that the socioeconomic and anthropometric data under analysis accounted for 38% of the variation in FEV1. In addition the analysis demonstrated that 34% of the variation in FVC could be accounted for by the variables used in the regression. A significant impact in explaining the variability of FVC was exhibited by the thorax mobility, financial situation of the participants and Pignet-Verwaecka Index. Analysis of the data indicates the significant role of selected socio-economic factors on the development of the biological specimens investigated. There were no perceptible pathologies, and the results can be treated as a credible interpretation of the influence exerted by the environment in which the teenagers under study grew up.
NASA Astrophysics Data System (ADS)
Neiles, Kelly Y.
There is great concern in the scientific community that students in the United States, when compared with other countries, are falling behind in their scientific achievement. Increasing students' reading comprehension of scientific text may be one of the components involved in students' science achievement. To investigate students' reading comprehension this quantitative study examined the effects of different reader characteristics, namely, students' logical reasoning ability, factual chemistry knowledge, working memory capacity, and schema of the chemistry concepts, on reading comprehension of a chemistry text. Students' reading comprehension was measured through their ability to encode the text, access the meanings of words (lexical access), make bridging and elaborative inferences, and integrate the text with their existing schemas to make a lasting mental representation of the text (situational model). Students completed a series of tasks that measured the reader characteristic and reading comprehension variables. Some of the variables were measured using new technologies and software to investigate different cognitive processes. These technologies and software included eye tracking to investigate students' lexical accessing and a Pathfinder program to investigate students' schema of the chemistry concepts. The results from this study were analyzed using canonical correlation and regression analysis. The canonical correlation analysis allows for the ten variables described previously to be included in one multivariate analysis. Results indicate that the relationship between the reader characteristic variables and the reading comprehension variables is significant. The resulting canonical function accounts for a greater amount of variance in students' responses then any individual variable. Regression analysis was used to further investigate which reader characteristic variables accounted for the differences in students' responses for each reading comprehension variable. The results from this regression analysis indicated that the two schema measures (measured by the Pathfinder program) accounted for the greatest amount of variance in four of the reading comprehension variables (encoding the text, bridging and elaborative inferences, and delayed recall of a general summary). This research suggest that providing students with background information on chemistry concepts prior to having them read the text may result in better understanding and more effective incorporation of the chemistry concepts into their schema.
Pease, J M; Morselli, M F
1987-01-01
This paper deals with a computer program adapted to a statistical method for analyzing an unlimited quantity of binary recorded data of an independent circular variable (e.g. wind direction), and a linear variable (e.g. maple sap flow volume). Circular variables cannot be statistically analyzed with linear methods, unless they have been transformed. The program calculates a critical quantity, the acrophase angle (PHI, phi o). The technique is adapted from original mathematics [1] and is written in Fortran 77 for easier conversion between computer networks. Correlation analysis can be performed following the program or regression which, because of the circular nature of the independent variable, becomes periodic regression. The technique was tested on a file of approximately 4050 data pairs.
Yampolskaya, Yulia A
2005-07-01
This study is concerned with long-term anthropometric examinations of children and adolescents aged 3-17 years in Moscow (over 10,500 persons, longitudinal and cross-sectional). Population variability of physical development was analyzed by means of regional estimation tables, which were developed on the basis of a regression analysis (scale of the regression of body mass to body length within a range from M - 1sigmaR to M + 2 sigmaR) and used for individual and group diagnostics taking into account age and sex. Such an approach allowed for the determination of the dynamics of the variability of Moscow schoolchildren from decade to decade (inter-population variability) and variations due to social differences (intra-population variability).
NASA Astrophysics Data System (ADS)
Peterson, K. T.; Wulamu, A.
2017-12-01
Water, essential to all living organisms, is one of the Earth's most precious resources. Remote sensing offers an ideal approach to monitor water quality over traditional in-situ techniques that are highly time and resource consuming. Utilizing a multi-scale approach, incorporating data from handheld spectroscopy, UAS based hyperspectal, and satellite multispectral images were collected in coordination with in-situ water quality samples for the two midwestern watersheds. The remote sensing data was modeled and correlated to the in-situ water quality variables including chlorophyll content (Chl), turbidity, and total dissolved solids (TDS) using Normalized Difference Spectral Indices (NDSI) and Partial Least Squares Regression (PLSR). The results of the study supported the original hypothesis that correlating water quality variables with remotely sensed data benefits greatly from the use of more complex modeling and regression techniques such as PLSR. The final results generated from the PLSR analysis resulted in much higher R2 values for all variables when compared to NDSI. The combination of NDSI and PLSR analysis also identified key wavelengths for identification that aligned with previous study's findings. This research displays the advantages and future for complex modeling and machine learning techniques to improve water quality variable estimation from spectral data.
Bias due to two-stage residual-outcome regression analysis in genetic association studies.
Demissie, Serkalem; Cupples, L Adrienne
2011-11-01
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. © 2011 Wiley Periodicals, Inc.
Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B
2016-09-01
Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
NASA Astrophysics Data System (ADS)
Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran
2018-03-01
This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).
NASA Astrophysics Data System (ADS)
Bhattacharyya, Sidhakam; Bandyopadhyay, Gautam
2010-10-01
The council of most of the Urban Local Bodies (ULBs) has a limited scope for decision making in the absence of appropriate financial control mechanism. The information about expected amount of own fund during a particular period is of great importance for decision making. Therefore, in this paper, efforts are being made to present set of findings and to establish a model of estimating receipts of own sources and payments thereof using multiple regression analysis. Data for sixty months from a reputed ULB in West Bengal have been considered for ascertaining the regression models. This can be used as a part of financial management and control procedure by the council to estimate the effect on own fund. In our study we have considered two models using multiple regression analysis. "Model I" comprises of total adjusted receipt as the dependent variable and selected individual receipts as the independent variables. Similarly "Model II" consists of total adjusted payments as the dependent variable and selected individual payments as independent variables. The resultant of Model I and Model II is the surplus or deficit effecting own fund. This may be applied for decision making purpose by the council.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Temple, P.J.; Mutters, R.J.; Adams, C.
1995-06-01
Biomass sampling plots were established at 29 locations within the dominant vegetation zones of the study area. Estimates of foliar biomass were made for each plot by three independent methods: regression analysis on the basis of tree diameter, calculation of the amount of light intercepted by the leaf canopy, and extrapolation from branch leaf area. Multivariate regression analysis was used to relate these foliar biomass estimates for oak plots and conifer plots to several independent predictor variables, including elevation, slope, aspect, temperature, precipitation, and soil chemical characteristics.
A single determinant dominates the rate of yeast protein evolution.
Drummond, D Allan; Raval, Alpan; Wilke, Claus O
2006-02-01
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be; Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels; Shabbir, A.
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standardmore » least squares.« less
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Follow-Up Imaging of Inflammatory Myofibroblastic Tumor of the Uterus and Its Spontaneous Regression
Markovic Vasiljkovic, Biljana; Plesinac Karapandzic, Vesna; Pejcic, Tomislav; Djuric Stefanovic, Aleksandra; Milosevic, Zorica; Plesinac, Snezana
2016-01-01
Inflammatory myofibroblastic tumor (IMT) is an aggressive benign mass that may arise from various tissues and organs with a great variability of histological and clinical appearances. Due to variable and nonspecific imaging findings, diagnosis of IMT is not obtained before surgery. The aim of this paper is to present CT and MRI findings during four-year follow-up of complete, spontaneous regression of IMT of the uterus. The diagnosis was made by histology and immunohistochemistry analysis of the open excisional biopsy specimen. At that time, the organ of origin was not specified. After analysis of the follow-up imaging findings and the mode of tumor regression, the uterus was proclaimed as the probable site of origin. IMT of the uterus is extremely rare and has been reported in ten cases up to now. The gradual, complete regression of uterine IMT documented by CT and MRI may contribute to understanding of its nature. PMID:27110328
Lapolla, Annunziata; Piarulli, Francesco; Sartore, Giovanni; Ceriello, Antonio; Ragazzi, Eugenio; Reitano, Rachele; Baccarin, Lorenzo; Laverda, Barbara; Fedele, Domenico
2007-03-01
Advanced glycation end products (AGEs), pentosidine and malondialdehyde (MDA), are elevated in type 2 diabetic subjects with coronary and carotid angiopathy. We investigated the relationship of AGEs, MDA, total reactive antioxidant potentials (TRAPs), and vitamin E in type 2 diabetic patients with and without peripheral artery disease (PAD). AGEs, pentosidine, MDA, TRAP, vitamin E, and ankle-brachial index (ABI) were measured in 99 consecutive type 2 diabetic subjects and 20 control subjects. AGEs, pentosidine, and MDA were higher and vitamin E and TRAP were lower in patients with PAD (ABI <0.9) than in patients without PAD (ABI >0.9) (P < 0.001). After multiple regression analysis, a correlation between AGEs and pentosidine, as independent variables, and ABI, as the dependent variable, was found in both patients with and without PAD (r = 0.9198, P < 0.001 and r = 0.5764, P < 0.001, respectively) but not in control subjects. When individual regression coefficients were evaluated, only that due to pentosidine was confirmed as significant. For patients with PAD, considering TRAP, vitamin E, and MDA as independent variables and ABI as the dependent variable produced an overall significant regression (r = 0.6913, P < 0.001). The regression coefficients for TRAP and vitamin E were not significant, indicating that the model is best explained by a single linear regression between MDA and ABI. These findings were also confirmed by principal component analysis. Results show that pentosidine and MDA are strongly associated with PAD in type 2 diabetic patients.
Sternberg, Maya R; Schleicher, Rosemary L; Pfeiffer, Christine M
2013-06-01
The collection of articles in this supplement issue provides insight into the association of various covariates with concentrations of biochemical indicators of diet and nutrition (biomarkers), beyond age, race, and sex, using linear regression. We studied 10 specific sociodemographic and lifestyle covariates in combination with 29 biomarkers from NHANES 2003-2006 for persons aged ≥ 20 y. The covariates were organized into 2 sets or "chunks": sociodemographic (age, sex, race-ethnicity, education, and income) and lifestyle (dietary supplement use, smoking, alcohol consumption, BMI, and physical activity) and fit in hierarchical fashion by using each category or set of related variables to determine how covariates, jointly, are related to biomarker concentrations. In contrast to many regression modeling applications, all variables were retained in a full regression model regardless of significance to preserve the interpretation of the statistical properties of β coefficients, P values, and CIs and to keep the interpretation consistent across a set of biomarkers. The variables were preselected before data analysis, and the data analysis plan was designed at the outset to minimize the reporting of false-positive findings by limiting the amount of preliminary hypothesis testing. Although we generally found that demographic differences seen in biomarkers were over- or underestimated when ignoring other key covariates, the demographic differences generally remained significant after adjusting for sociodemographic and lifestyle variables. These articles are intended to provide a foundation to researchers to help them generate hypotheses for future studies or data analyses and/or develop predictive regression models using the wealth of NHANES data.
Lewis, Jason M.
2010-01-01
Peak-streamflow regression equations were determined for estimating flows with exceedance probabilities from 50 to 0.2 percent for the state of Oklahoma. These regression equations incorporate basin characteristics to estimate peak-streamflow magnitude and frequency throughout the state by use of a generalized least squares regression analysis. The most statistically significant independent variables required to estimate peak-streamflow magnitude and frequency for unregulated streams in Oklahoma are contributing drainage area, mean-annual precipitation, and main-channel slope. The regression equations are applicable for watershed basins with drainage areas less than 2,510 square miles that are not affected by regulation. The resulting regression equations had a standard model error ranging from 31 to 46 percent. Annual-maximum peak flows observed at 231 streamflow-gaging stations through water year 2008 were used for the regression analysis. Gage peak-streamflow estimates were used from previous work unless 2008 gaging-station data were available, in which new peak-streamflow estimates were calculated. The U.S. Geological Survey StreamStats web application was used to obtain the independent variables required for the peak-streamflow regression equations. Limitations on the use of the regression equations and the reliability of regression estimates for natural unregulated streams are described. Log-Pearson Type III analysis information, basin and climate characteristics, and the peak-streamflow frequency estimates for the 231 gaging stations in and near Oklahoma are listed. Methodologies are presented to estimate peak streamflows at ungaged sites by using estimates from gaging stations on unregulated streams. For ungaged sites on urban streams and streams regulated by small floodwater retarding structures, an adjustment of the statewide regression equations for natural unregulated streams can be used to estimate peak-streamflow magnitude and frequency.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Hossain, Md Golam; Saw, Aik; Alam, Rashidul; Ohtsuki, Fumio; Kamarul, Tunku
2013-09-01
Cephalic index (CI), the ratio of head breadth to head length, is widely used to categorise human populations. The aim of this study was to access the impact of anthropometric measurements on the CI of male Japanese university students. This study included 1,215 male university students from Tokyo and Kyoto, selected using convenient sampling. Multiple regression analysis was used to determine the effect of anthropometric measurements on CI. The variance inflation factor (VIF) showed no evidence of a multicollinearity problem among independent variables. The coefficients of the regression line demonstrated a significant positive relationship between CI and minimum frontal breadth (p < 0.01), bizygomatic breadth (p < 0.01) and head height (p < 0.05), and a negative relationship between CI and morphological facial height (p < 0.01) and head circumference (p < 0.01). Moreover, the coefficient and odds ratio of logistic regression analysis showed a greater likelihood for minimum frontal breadth (p < 0.01) and bizygomatic breadth (p < 0.01) to predict round-headedness, and morphological facial height (p < 0.05) and head circumference (p < 0.01) to predict long-headedness. Stepwise regression analysis revealed bizygomatic breadth, head circumference, minimum frontal breadth, head height and morphological facial height to be the best predictor craniofacial measurements with respect to CI. The results suggest that most of the variables considered in this study appear to influence the CI of adult male Japanese students.
Taljaard, Monica; McKenzie, Joanne E; Ramsay, Craig R; Grimshaw, Jeremy M
2014-06-19
An interrupted time series design is a powerful quasi-experimental approach for evaluating effects of interventions introduced at a specific point in time. To utilize the strength of this design, a modification to standard regression analysis, such as segmented regression, is required. In segmented regression analysis, the change in intercept and/or slope from pre- to post-intervention is estimated and used to test causal hypotheses about the intervention. We illustrate segmented regression using data from a previously published study that evaluated the effectiveness of a collaborative intervention to improve quality in pre-hospital ambulance care for acute myocardial infarction (AMI) and stroke. In the original analysis, a standard regression model was used with time as a continuous variable. We contrast the results from this standard regression analysis with those from segmented regression analysis. We discuss the limitations of the former and advantages of the latter, as well as the challenges of using segmented regression in analysing complex quality improvement interventions. Based on the estimated change in intercept and slope from pre- to post-intervention using segmented regression, we found insufficient evidence of a statistically significant effect on quality of care for stroke, although potential clinically important effects for AMI cannot be ruled out. Segmented regression analysis is the recommended approach for analysing data from an interrupted time series study. Several modifications to the basic segmented regression analysis approach are available to deal with challenges arising in the evaluation of complex quality improvement interventions.
Rosswog, Carolina; Schmidt, Rene; Oberthuer, André; Juraeva, Dilafruz; Brors, Benedikt; Engesser, Anne; Kahlert, Yvonne; Volland, Ruth; Bartenhagen, Christoph; Simon, Thorsten; Berthold, Frank; Hero, Barbara; Faldum, Andreas; Fischer, Matthias
2017-12-01
Current risk stratification systems for neuroblastoma patients consider clinical, histopathological, and genetic variables, and additional prognostic markers have been proposed in recent years. We here sought to select highly informative covariates in a multistep strategy based on consecutive Cox regression models, resulting in a risk score that integrates hazard ratios of prognostic variables. A cohort of 695 neuroblastoma patients was divided into a discovery set (n=75) for multigene predictor generation, a training set (n=411) for risk score development, and a validation set (n=209). Relevant prognostic variables were identified by stepwise multivariable L1-penalized least absolute shrinkage and selection operator (LASSO) Cox regression, followed by backward selection in multivariable Cox regression, and then integrated into a novel risk score. The variables stage, age, MYCN status, and two multigene predictors, NB-th24 and NB-th44, were selected as independent prognostic markers by LASSO Cox regression analysis. Following backward selection, only the multigene predictors were retained in the final model. Integration of these classifiers in a risk scoring system distinguished three patient subgroups that differed substantially in their outcome. The scoring system discriminated patients with diverging outcome in the validation cohort (5-year event-free survival, 84.9±3.4 vs 63.6±14.5 vs 31.0±5.4; P<.001), and its prognostic value was validated by multivariable analysis. We here propose a translational strategy for developing risk assessment systems based on hazard ratios of relevant prognostic variables. Our final neuroblastoma risk score comprised two multigene predictors only, supporting the notion that molecular properties of the tumor cells strongly impact clinical courses of neuroblastoma patients. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Introduction to uses and interpretation of principal component analyses in forest biology.
J. G. Isebrands; Thomas R. Crow
1975-01-01
The application of principal component analysis for interpretation of multivariate data sets is reviewed with emphasis on (1) reduction of the number of variables, (2) ordination of variables, and (3) applications in conjunction with multiple regression.
Black Male Labor Force Participation.
ERIC Educational Resources Information Center
Baer, Roger K.
This study attempts to test (via multiple regression analysis) hypothesized relationships between designated independent variables and age specific incidences of labor force participation for black male subpopulations in 54 Standard Metropolitan Statistical Areas. Leading independent variables tested include net migration, earnings, unemployment,…
Enhance-Synergism and Suppression Effects in Multiple Regression
ERIC Educational Resources Information Center
Lipovetsky, Stan; Conklin, W. Michael
2004-01-01
Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…
Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier
2018-02-01
Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Are your covariates under control? How normalization can re-introduce covariate effects.
Pain, Oliver; Dudbridge, Frank; Ronald, Angelica
2018-04-30
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Panayi, Efstathios; Peters, Gareth W; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields.
Panayi, Efstathios; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254
Milner, Allison; Aitken, Zoe; Kavanagh, Anne; LaMontagne, Anthony D; Pega, Frank; Petrie, Dennis
2017-06-23
Previous studies suggest that poor psychosocial job quality is a risk factor for mental health problems, but they use conventional regression analytic methods that cannot rule out reverse causation, unmeasured time-invariant confounding and reporting bias. This study combines two quasi-experimental approaches to improve causal inference by better accounting for these biases: (i) linear fixed effects regression analysis and (ii) linear instrumental variable analysis. We extract 13 annual waves of national cohort data including 13 260 working-age (18-64 years) employees. The exposure variable is self-reported level of psychosocial job quality. The instruments used are two common workplace entitlements. The outcome variable is the Mental Health Inventory (MHI-5). We adjust for measured time-varying confounders. In the fixed effects regression analysis adjusted for time-varying confounders, a 1-point increase in psychosocial job quality is associated with a 1.28-point improvement in mental health on the MHI-5 scale (95% CI: 1.17, 1.40; P < 0.001). When the fixed effects was combined with the instrumental variable analysis, a 1-point increase psychosocial job quality is related to 1.62-point improvement on the MHI-5 scale (95% CI: -0.24, 3.48; P = 0.088). Our quasi-experimental results provide evidence to confirm job stressors as risk factors for mental ill health using methods that improve causal inference. © The Author 2017. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
Study of process variables associated with manufacturing hermetically-sealed nickel-cadmium cells
NASA Technical Reports Server (NTRS)
Miller, L.; Doan, D. J.; Carr, E. S.
1971-01-01
A program to determine and study the critical process variables associated with the manufacture of aerospace, hermetically-sealed, nickel-cadmium cells is described. The determination and study of the process variables associated with the positive and negative plaque impregnation/polarization process are emphasized. The experimental data resulting from the implementation of fractional factorial design experiments are analyzed by means of a linear multiple regression analysis technique. This analysis permits the selection of preferred levels for certain process variables to achieve desirable impregnated plaque characteristics.
Solvency supervision based on a total balance sheet approach
NASA Astrophysics Data System (ADS)
Pitselis, Georgios
2009-11-01
In this paper we investigate the adequacy of the own funds a company requires in order to remain healthy and avoid insolvency. Two methods are applied here; the quantile regression method and the method of mixed effects models. Quantile regression is capable of providing a more complete statistical analysis of the stochastic relationship among random variables than least squares estimation. The estimated mixed effects line can be considered as an internal industry equation (norm), which explains a systematic relation between a dependent variable (such as own funds) with independent variables (e.g. financial characteristics, such as assets, provisions, etc.). The above two methods are implemented with two data sets.
Hao, Yong; Sun, Xu-Dong; Yang, Qiang
2012-12-01
Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.
NASA Astrophysics Data System (ADS)
Hasyim, M.; Prastyo, D. D.
2018-03-01
Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Multilevel Models for Binary Data
ERIC Educational Resources Information Center
Powers, Daniel A.
2012-01-01
The methods and models for categorical data analysis cover considerable ground, ranging from regression-type models for binary and binomial data, count data, to ordered and unordered polytomous variables, as well as regression models that mix qualitative and continuous data. This article focuses on methods for binary or binomial data, which are…
The dynamic relationship between Bursa Malaysia composite index and macroeconomic variables
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Rose, Farid Zamani Che; Rahman, Rosmanjawati Abd.
2017-08-01
This study investigates and analyzes the long run and short run relationships between Bursa Malaysia Composite index (KLCI) and nine macroeconomic variables in a VAR/VECM framework. After regression analysis seven out the nine macroeconomic variables are chosen for further analysis. The use of Johansen-Juselius Cointegration and Vector Error Correction Model (VECM) technique indicate that there are long run relationships between the seven macroeconomic variables and KLCI. Meanwhile, Granger causality test shows that bidirectional relationship between KLCI and oil price. Furthermore, after 12 months the shock on KLCI are explained by innovations of the seven macroeconomic variables. This indicate the close relationship between macroeconomic variables and KLCI.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
NASA Technical Reports Server (NTRS)
Kalton, G.
1983-01-01
A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
ERIC Educational Resources Information Center
Greer, Wil
2013-01-01
This study identified the variables associated with data-driven instruction (DDI) that are perceived to best predict student achievement. Of the DDI variables discussed in the literature, 51 of them had a sufficient enough research base to warrant statistical analysis. Of them, 26 were statistically significant. Multiple regression and an…
Preliminary results of spatial modeling of selected forest health variables in Georgia
Brock Stewart; Chris J. Cieszewski
2009-01-01
Variables relating to forest health monitoring, such as mortality, are difficult to predict and model. We present here the results of fitting various spatial regression models to these variables. We interpolate plot-level values compiled from the Forest Inventory and Analysis National Information Management System (FIA-NIMS) data that are related to forest health....
Creating a non-linear total sediment load formula using polynomial best subset regression model
NASA Astrophysics Data System (ADS)
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Anxiety, affect, self-esteem, and stress: mediation and moderation effects on depression.
Nima, Ali Al; Rosenberg, Patricia; Archer, Trevor; Garcia, Danilo
2013-01-01
Mediation analysis investigates whether a variable (i.e., mediator) changes in regard to an independent variable, in turn, affecting a dependent variable. Moderation analysis, on the other hand, investigates whether the statistical interaction between independent variables predict a dependent variable. Although this difference between these two types of analysis is explicit in current literature, there is still confusion with regard to the mediating and moderating effects of different variables on depression. The purpose of this study was to assess the mediating and moderating effects of anxiety, stress, positive affect, and negative affect on depression. Two hundred and two university students (males = 93, females = 113) completed questionnaires assessing anxiety, stress, self-esteem, positive and negative affect, and depression. Mediation and moderation analyses were conducted using techniques based on standard multiple regression and hierarchical regression analyses. The results indicated that (i) anxiety partially mediated the effects of both stress and self-esteem upon depression, (ii) that stress partially mediated the effects of anxiety and positive affect upon depression, (iii) that stress completely mediated the effects of self-esteem on depression, and (iv) that there was a significant interaction between stress and negative affect, and between positive affect and negative affect upon depression. The study highlights different research questions that can be investigated depending on whether researchers decide to use the same variables as mediators and/or moderators.
Velocity structure in long period variable star atmospheres
NASA Technical Reports Server (NTRS)
Pilachowski, C.; Wallerstein, G.; Willson, L. A.
1980-01-01
A regression analysis of the dependence of absorption line velocities on wavelength, line strength, excitation potential, and ionization potential is presented. The method determines the region of formation of the absorption lines for a given data and wavelength region. It is concluded that the scatter which is frequently found in velocity measurements of absorption lines in long period variables is probably the result of a shock of moderate amplitude located in or near the reversing layer and that the frequently observed correlation of velocity with excitation and ionization are a result of the velocity gradients produced by this shock in the atmosphere. A simple interpretation of the signs of the coefficients of the regression analysis is presented in terms of preshock, post shock, or across the shock, together with criteria for evaluating the validity of the fit. The amplitude of the reversing layer shock is estimated from an analysis of a series of plates for four long period variable stars along with the most probable stellar velocity for these stars.
The value of a statistical life: a meta-analysis with a mixed effects regression model.
Bellavance, François; Dionne, Georges; Lebeau, Martin
2009-03-01
The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.
Analytics For Distracted Driver Behavior Modeling in Dilemma Zone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jan-Mou; Malikopoulos, Andreas; Thakur, Gautam
2014-01-01
In this paper, we present the results obtained and insights gained through the analysis of TRB contest data. We used exploratory analysis, regression, and clustering models for gaining insights into the driver behavior in a dilemma zone while driving under distraction. While simple exploratory analysis showed the distinguishing driver behavior patterns among different popu- lation groups in the dilemma zone, regression analysis showed statically signification relationships between groups of variables. In addition to analyzing the contest data, we have also looked into the possible impact of distracted driving on the fuel economy.
Using exogenous variables in testing for monotonic trends in hydrologic time series
Alley, William M.
1988-01-01
One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.
Psychosocial variables and time to injury onset: a hurdle regression analysis model.
Sibold, Jeremy; Zizzi, Samuel
2012-01-01
Psychological variables have been shown to be related to athletic injury and time missed from participation in sport. We are unaware of any empirical examination of the influence of psychological variables on time to onset of injury. To examine the influence of orthopaedic and psychosocial variables on time to injury in college athletes. One hundred seventy-seven (men 5 116, women 5 61; age 5 19.45 6 1.39 years) National Collegiate Athletic Association Division II athletes. Hurdle regression analysis (HRA) was used to determine the influence of predictor variables on days to first injury. Worry (z = 2.98, P = .003), concentration disruption (z = -3.95, P < .001), and negative life-event stress (z = 5.02, P < .001) were robust predictors of days to injury. Orthopaedic risk score was not a predictor (z = 1.28, P = .20). These findings support previous research on the stress-injury relationship, and our group is the first to use HRA in athletic injury data. These data support the addition of psychological screening as part of preseason health examinations for collegiate athletes.
Granato, Gregory E.
2012-01-01
A nationwide study to better define triangular-hydrograph statistics for use with runoff-quality and flood-flow studies was done by the U.S. Geological Survey (USGS) in cooperation with the Federal Highway Administration. Although the triangular hydrograph is a simple linear approximation, the cumulative distribution of stormflow with a triangular hydrograph is a curvilinear S-curve that closely approximates the cumulative distribution of stormflows from measured data. The temporal distribution of flow within a runoff event can be estimated using the basin lagtime, (which is the time from the centroid of rainfall excess to the centroid of the corresponding runoff hydrograph) and the hydrograph recession ratio (which is the ratio of the duration of the falling limb to the rising limb of the hydrograph). This report documents results of the study, methods used to estimate the variables, and electronic files that facilitate calculation of variables. Ten viable multiple-linear regression equations were developed to estimate basin lagtimes from readily determined drainage basin properties using data published in 37 stormflow studies. Regression equations using the basin lag factor (BLF, which is a variable calculated as the main-channel length, in miles, divided by the square root of the main-channel slope in feet per mile) and two variables describing development in the drainage basin were selected as the best candidates, because each equation explains about 70 percent of the variability in the data. The variables describing development are the USGS basin development factor (BDF, which is a function of the amount of channel modifications, storm sewers, and curb-and-gutter streets in a basin) and the total impervious area variable (IMPERV) in the basin. Two datasets were used to develop regression equations. The primary dataset included data from 493 sites that have values for the BLF, BDF, and IMPERV variables. This dataset was used to develop the best-fit regression equation using the BLF and BDF variables. The secondary dataset included data from 896 sites that have values for the BLF and IMPERV variables. This dataset was used to develop the best-fit regression equation using the BLF and IMPERV variables. Analysis of hydrograph recession ratios and basin characteristics for 41 sites indicated that recession ratios are random variables. Thus, recession ratios cannot be estimated quantitatively using multiple linear regression equations developed using the data available for these sites. The minimums of recession ratios for different streamgages are well characterized by a value of one. The most probable values and maximum values of recession ratios for different streamgages are, however, more variable than the minimums. The most probable values of recession ratios for the 41 streamgages analyzed ranged from 1.0 to 3.52 and had a median of 1.85. The maximum values ranged from 2.66 to 11.3 and had a median of 4.36.
Hasegawa, Daisuke; Onishi, Hideo; Matsutomo, Norikazu
2016-02-01
This study aimed to evaluate the novel index of hepatic receptor (IHR) on the regression analysis derived from time activity curve of the liver for hepatic functional reserve. Sixty patients had undergone (99m)Tc-galactosyl serum albumin ((99m)Tc-GSA) scintigraphy in the retrospective clinical study. Time activity curves for liver were obtained by region of interest (ROI) on the whole liver. A novel hepatic functional predictor was calculated with multiple regression analysis of time activity curves. In the multiple regression function, the objective variables were the indocyanine green (ICG) retention rate at 15 min, and the explanatory variables were the liver counts in 3-min intervals until end from beginning. Then, this result was defined by IHR, and we analyzed the correlation between IHR and ICG, uptake ratio of the heart at 15 minutes to that at 3 minutes (HH15), uptake ratio of the liver to the liver plus heart at 15 minutes (LHL15), and index of convexity (IOC). Regression function of IHR was derived as follows: IHR=0.025×L(6)-0.052×L(12)+0.027×L(27). The multiple regression analysis indicated that liver counts at 6 min, 12 min, and 27 min were significantly related to objective variables. The correlation coefficient between IHR and ICG was 0.774, and the correlation coefficient between ICG and conventional indices (HH15, LHL15, and IOC) were 0.837, 0.773, and 0.793, respectively. IHR had good correlation with HH15, LHL15, and IOC. The finding results suggested that IHR would provide clinical benefit for hepatic functional assessment in the (99m)Tc-GSA scintigraphy.
Emerson, Douglas G.; Vecchia, Aldo V.; Dahl, Ann L.
2005-01-01
The drainage-area ratio method commonly is used to estimate streamflow for sites where no streamflow data were collected. To evaluate the validity of the drainage-area ratio method and to determine if an improved method could be developed to estimate streamflow, a multiple-regression technique was used to determine if drainage area, main channel slope, and precipitation were significant variables for estimating streamflow in the Red River of the North Basin. A separate regression analysis was performed for streamflow for each of three seasons-- winter, spring, and summer. Drainage area and summer precipitation were the most significant variables. However, the regression equations generally overestimated streamflows for North Dakota stations and underestimated streamflows for Minnesota stations. To correct the bias in the residuals for the two groups of stations, indicator variables were included to allow both the intercept and the coefficient for the logarithm of drainage area to depend on the group. Drainage area was the only significant variable in the revised regression equations. The exponents for the drainage-area ratio were 0.85 for the winter season, 0.91 for the spring season, and 1.02 for the summer season.
Tan, Ge; Yuan, Ruozhen; Wei, ChenChen; Xu, Mangmang; Liu, Ming
2018-05-26
Association between serum calcium and magnesium versus hemorrhagic transformation (HT) remains to be identified. A total of 1212 non-thrombolysis patients with serum calcium and magnesium collected within 24 h from stroke onset were enrolled. Backward stepwise multivariate logistic regression analysis was conducted to investigate association between calcium and magnesium versus HT. Calcium and magnesium were entered into logistic regression analysis in two models, separately: model 1, as continuous variable (per 1-mmol/L increase), and model 2, as four-categorized variable (being collapsed into quartiles). HT occurred in 140 patients (11.6%). Serum calcium was slightly lower in patients with HT than in patient without HT (P = 0.273). But serum magnesium was significantly lower in patients with HT than in patients without HT (P = 0.007). In logistic regression analysis, calcium displayed no association with HT. Magnesium, as either continuous or four-categorized variable, was independently and inversely associated with HT in stroke overall and stroke of large-artery atherosclerosis (LAA). The results demonstrated that serum calcium had no association with HT in patients without thrombolysis after acute ischemic stroke. Serum magnesium in low level was independently associated with increasing HT in stroke overall and particularly in stroke of LAA.
NASA Astrophysics Data System (ADS)
Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak
2017-02-01
Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni
2015-01-01
Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960
A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni
2015-01-01
Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.
Confidence Intervals for Squared Semipartial Correlation Coefficients: The Effect of Nonnormality
ERIC Educational Resources Information Center
Algina, James; Keselman, H. J.; Penfield, Randall D.
2010-01-01
The increase in the squared multiple correlation coefficient ([delta]R[superscript 2]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. Algina, Keselman, and Penfield found that intervals based on asymptotic principles were typically very inaccurate, even though the sample size…
Sibling dilution hypothesis: a regression surface analysis.
Marjoribanks, K
2001-08-01
This study examined relationships between sibship size (the number of children in a family), birth order, and measures of academic performance, academic self-concept, and educational aspirations at different levels of family educational resources. As part of a national longitudinal study of Australian secondary school students data were collected from 2,530 boys and 2,450 girls in Years 9 and 10. Regression surfaces were constructed from models that included terms to account for linear, interaction, and curvilinear associations among the variables. Analysis suggests the general propositions (a) family educational resources have significant associations with children's school-related outcomes at different levels of sibling variables, the relationships for girls being curvilinear, and (b) sibling variables continue to have small significant associations with affective and cognitive outcomes, after taking into account variations in family educational resources. That is, the investigation provides only partial support for the sibling dilution hypothesis.
Social network type and morale in old age.
Litwin, H
2001-08-01
The aim of this research was to derive network types among an elderly population and to examine the relationship of network type to morale. Secondary analysis of data compiled by the Israeli Central Bureau of Statistics (n = 2,079) was employed, and network types were derived through K-means cluster analysis. Respondents' morale scores were regressed on network types, controlling for background and health variables. Five network types were derived. Respondents in diverse or friends networks reported the highest morale; those in exclusively family or restricted networks had the lowest. Multivariate regression analysis underscored that certain network types were second among the study variables in predicting respondents' morale, preceded only by disability level (Adjusted R(2) =.41). Classification of network types allows consideration of the interpersonal environments of older people in relation to outcomes of interest. The relative effects on morale of elective versus obligated social ties, evident in the current analysis, is a case in point.
Principal components analysis in clinical studies.
Zhang, Zhongheng; Castelló, Adela
2017-09-01
In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.
Neural correlates of gait variability in people with multiple sclerosis with fall history.
Kalron, Alon; Allali, Gilles; Achiron, Anat
2018-05-28
Investigate the association between step time variability and related brain structures in accordance with fall status in people with multiple sclerosis (PwMS). The study included 225 PwMS. A whole-brain MRI was performed by a high-resolution 3.0-Telsa MR scanner in addition to volumetric analysis based on 3D T1-weighted images using the FreeSurfer image analysis suite. Step time variability was measured by an electronic walkway. Participants were defined as "fallers" (at least two falls during the previous year) and "non-fallers". One hundred and five PwMS were defined as fallers and had a greater step time variability compared to non-fallers (5.6% (S.D.=3.4) vs. 3.4% (S.D.=1.5); p=0.001). MS fallers exhibited a reduced volume in the left caudate and both cerebellum hemispheres compared to non-fallers. By using a linear regression analysis no association was found between gait variability and related brain structures in the total cohort and non-fallers group. However, the analysis found an association between the left hippocampus and left putamen volumes with step time variability in the faller group; p=0.031, 0.048, respectively, controlling for total cranial volume, walking speed, disability, age and gender. Nevertheless, according to the hierarchical regression model, the contribution of these brain measures to predict gait variability was relatively small compared to walking speed. An association between low left hippocampal, putamen volumes and step time variability was found in PwMS with a history of falls, suggesting brain structural characteristics may be related to falls and increased gait variability in PwMS. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha
2012-05-01
Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Penna, M.L.; Duchiade, M.P.
The authors report the results of an investigation into the possible association between air pollution and infant mortality from pneumonia in the Rio de Janeiro Metropolitan Area. This investigation employed multiple linear regression analysis (stepwise method) for infant mortality from pneumonia in 1980, including the study population's areas of residence, incomes, and pollution exposure as independent variables. With the income variable included in the regression, a statistically significant association was observed between the average annual level of particulates and infant mortality from pneumonia. While this finding should be accepted with caution, it does suggest a biological association between these variables.more » The authors' conclusion is that air quality indicators should be included in studies of acute respiratory infections in developing countries.« less
Improving Space Project Cost Estimating with Engineering Management Variables
NASA Technical Reports Server (NTRS)
Hamaker, Joseph W.; Roth, Axel (Technical Monitor)
2001-01-01
Current space project cost models attempt to predict space flight project cost via regression equations, which relate the cost of projects to technical performance metrics (e.g. weight, thrust, power, pointing accuracy, etc.). This paper examines the introduction of engineering management parameters to the set of explanatory variables. A number of specific engineering management variables are considered and exploratory regression analysis is performed to determine if there is statistical evidence for cost effects apart from technical aspects of the projects. It is concluded that there are other non-technical effects at work and that further research is warranted to determine if it can be shown that these cost effects are definitely related to engineering management.
[Associated factors in newborns with intrauterine growth retardation].
Thompson-Chagoyán, Oscar C; Vega-Franco, Leopoldo
2008-01-01
To identify the risk factors implicated in the intrauterine growth retardation (IUGR) of neonates born in a social security institution. Case controls design study in 376 neonates: 188 with IUGR (weight < 10 percentile) and 188 without IUGR. When they born, information about 30 variables of risk for IUGR were obtained from mothers. Risk analysis and logistical regression (stepwise) were used. Odds ratios were significant for 12 of the variables. The model obtains by stepwise regression included: weight gain at pregnancy, prenatal care attendance, toxemia, chocolate ingestion, father's weight, and the environmental house. Must of the variables included in the model are related to socioeconomic disadvantages related to the risk of RCIU in the population.
NASA Astrophysics Data System (ADS)
García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.; Díaz, M.
2008-03-01
This work has evaluated the probability of earthquake-triggered landslide occurrence in the whole of El Salvador, with a Geographic Information System (GIS) and a logistic regression model. Slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness are the predictor variables used to determine the dependent variable of occurrence or non-occurrence of landslides within an individual grid cell. The results illustrate the importance of terrain roughness and soil type as key factors within the model — using only these two variables the analysis returned a significance level of 89.4%. The results obtained from the model within the GIS were then used to produce a map of relative landslide susceptibility.
An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach
NASA Astrophysics Data System (ADS)
Grant, Christina P.
San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.
Kronholm, Scott C.; Capel, Paul D.; Terziotti, Silvia
2016-01-01
Accurate estimation of total nitrogen loads is essential for evaluating conditions in the aquatic environment. Extrapolation of estimates beyond measured streams will greatly expand our understanding of total nitrogen loading to streams. Recursive partitioning and random forest regression were used to assess 85 geospatial, environmental, and watershed variables across 636 small (<585 km2) watersheds to determine which variables are fundamentally important to the estimation of annual loads of total nitrogen. Initial analysis led to the splitting of watersheds into three groups based on predominant land use (agricultural, developed, and undeveloped). Nitrogen application, agricultural and developed land area, and impervious or developed land in the 100-m stream buffer were commonly extracted variables by both recursive partitioning and random forest regression. A series of multiple linear regression equations utilizing the extracted variables were created and applied to the watersheds. As few as three variables explained as much as 76 % of the variability in total nitrogen loads for watersheds with predominantly agricultural land use. Catchment-scale national maps were generated to visualize the total nitrogen loads and yields across the USA. The estimates provided by these models can inform water managers and help identify areas where more in-depth monitoring may be beneficial.
NASA Astrophysics Data System (ADS)
Lee, Jangho; Kim, Kwang-Yul
2018-02-01
CSEOF analysis is applied for the springtime (March, April, May) daily PM10 concentrations measured at 23 Ministry of Environment stations in Seoul, Korea for the period of 2003-2012. Six meteorological variables at 12 pressure levels are also acquired from the ERA Interim reanalysis datasets. CSEOF analysis is conducted for each meteorological variable over East Asia. Regression analysis is conducted in CSEOF space between the PM10 concentrations and individual meteorological variables to identify associated atmospheric conditions for each CSEOF mode. By adding the regressed loading vectors with the mean meteorological fields, the daily atmospheric conditions are obtained for the first five CSEOF modes. Then, HYSPLIT model is run with the atmospheric conditions for each CSEOF mode in order to back trace the air parcels and dust reaching Seoul. The K-means clustering algorithm is applied to identify major source regions for each CSEOF mode of the PM10 concentrations in Seoul. Three main source regions identified based on the mean fields are: (1) northern Taklamakan Desert (NTD), (2) Gobi Desert and (GD), and (3) East China industrial area (ECI). The main source regions for the mean meteorological fields are consistent with those of previous study; 41% of the source locations are located in GD followed by ECI (37%) and NTD (21%). Back trajectory calculations based on CSEOF analysis of meteorological variables identify distinct source characteristics associated with each CSEOF mode and greatly facilitate the interpretation of the PM10 variability in Seoul in terms of transportation route and meteorological conditions including the source area.
Fish assemblages at 16 sites in the upper French Broad River basin, North Carolina were related to environmental variables using detrended correspondence analysis (DCA) and linear regression. This study was conducted at the landscape scale because regional variables are controlle...
Directional Dependence in Developmental Research
ERIC Educational Resources Information Center
von Eye, Alexander; DeShon, Richard P.
2012-01-01
In this article, we discuss and propose methods that may be of use to determine direction of dependence in non-normally distributed variables. First, it is shown that standard regression analysis is unable to distinguish between explanatory and response variables. Then, skewness and kurtosis are discussed as tools to assess deviation from…
Centering Effects in HLM Level-1 Predictor Variables.
ERIC Educational Resources Information Center
Schumacker, Randall E.; Bembry, Karen
Research has suggested that important research questions can be addressed with meaningful interpretations using hierarchical linear modeling (HLM). The proper interpretation of results, however, is invariably linked to the choice of centering for the Level-1 predictor variables that produce the outcome measure for the Level-2 regression analysis.…
Predicting Adaptive Functioning of Mentally Retarded Persons in Community Settings.
ERIC Educational Resources Information Center
Hull, John T.; Thompson, Joy C.
1980-01-01
The impact of a variety of individual, residential, and community variables on adaptive functioning of 369 retarded persons (18 to 73 years old) was examined using a multiple regression analysis. Individual characteristics (especially IQ) accounted for 21 percent of the variance, while environmental variables, primarily those related to…
Brabant, Marie-Eve; Hébert, Martine; Chagnon, François
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.
ERIC Educational Resources Information Center
Games, Paul A.
1975-01-01
A brief introduction is presented on how multiple regression and linear model techniques can handle data analysis situations that most educators and psychologists think of as appropriate for analysis of variance. (Author/BJG)
Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki
2016-01-01
To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. © 2016 S. Karger GmbH, Freiburg.
Sawamoto, Ryoko; Nozaki, Takehiro; Furukawa, Tomokazu; Tanahashi, Tokusei; Morita, Chihiro; Hata, Tomokazu; Komaki, Gen; Sudo, Nobuyuki
2016-01-01
Objective To investigate predictors of dropout from a group cognitive behavioral therapy (CBT) intervention for overweight or obese women. Methods 119 overweight and obese Japanese women aged 25-65 years who attended an outpatient weight loss intervention were followed throughout the 7-month weight loss phase. Somatic characteristics, socioeconomic status, obesity-related diseases, diet and exercise habits, and psychological variables (depression, anxiety, self-esteem, alexithymia, parenting style, perfectionism, and eating attitude) were assessed at baseline. Significant variables, extracted by univariate statistical analysis, were then used as independent variables in a stepwise multiple logistic regression analysis with dropout as the dependent variable. Results 90 participants completed the weight loss phase, giving a dropout rate of 24.4%. The multiple logistic regression analysis demonstrated that compared to completers the dropouts had significantly stronger body shape concern, tended to not have jobs, perceived their mothers to be less caring, and were more disorganized in temperament. Of all these factors, the best predictor of dropout was shape concern. Conclusion Shape concern, job condition, parenting care, and organization predicted dropout from the group CBT weight loss intervention for overweight or obese Japanese women. PMID:26745715
NASA Astrophysics Data System (ADS)
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
Brennan, Angela K.; Cross, Paul C.; Creely, Scott
2015-01-01
Synthesis and applications. Our analysis of elk group size distributions using quantile regression suggests that private land, irrigation, open habitat, elk density and wolf abundance can affect large elk group sizes. Thus, to manage larger groups by removal or dispersal of individuals, we recommend incentivizing hunting on private land (particularly if irrigated) during the regular and late hunting seasons, promoting tolerance of wolves on private land (if elk aggregate in these areas to avoid wolves) and creating more winter range and varied habitats. Relationships to the variables of interest also differed by quantile, highlighting the importance of using quantile regression to examine response variables more completely to uncover relationships important to conservation and management.
Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin.
Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi
2017-05-01
Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination (R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.
Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin
NASA Astrophysics Data System (ADS)
Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi
2017-05-01
Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination ( R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.
Sparse modeling of spatial environmental variables associated with asthma
Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.
2014-01-01
Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437
Sparse modeling of spatial environmental variables associated with asthma.
Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W
2015-02-01
Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.
Explaining the heterogeneous scrapie surveillance figures across Europe: a meta-regression approach.
Del Rio Vilas, Victor J; Hopp, Petter; Nunes, Telmo; Ru, Giuseppe; Sivam, Kumar; Ortiz-Pelaez, Angel
2007-06-28
Two annual surveys, the abattoir and the fallen stock, monitor the presence of scrapie across Europe. A simple comparison between the prevalence estimates in different countries reveals that, in 2003, the abattoir survey appears to detect more scrapie in some countries. This is contrary to evidence suggesting the greater ability of the fallen stock survey to detect the disease. We applied meta-analysis techniques to study this apparent heterogeneity in the behaviour of the surveys across Europe. Furthermore, we conducted a meta-regression analysis to assess the effect of country-specific characteristics on the variability. We have chosen the odds ratios between the two surveys to inform the underlying relationship between them and to allow comparisons between the countries under the meta-regression framework. Baseline risks, those of the slaughtered populations across Europe, and country-specific covariates, available from the European Commission Report, were inputted in the model to explain the heterogeneity. Our results show the presence of significant heterogeneity in the odds ratios between countries and no reduction in the variability after adjustment for the different risks in the baseline populations. Three countries contributed the most to the overall heterogeneity: Germany, Ireland and The Netherlands. The inclusion of country-specific covariates did not, in general, reduce the variability except for one variable: the proportion of the total adult sheep population sampled as fallen stock by each country. A large residual heterogeneity remained in the model indicating the presence of substantial effect variability between countries. The meta-analysis approach was useful to assess the level of heterogeneity in the implementation of the surveys and to explore the reasons for the variation between countries.
Byun, Bo-Ram; Kim, Yong-Il; Yamaguchi, Tetsutaro; Maki, Koutaro; Son, Woo-Sung
2015-01-01
This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6-18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R (2) had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status.
Lannau, B; Van Geyt, C; Van Maele, G; Beele, H
2015-03-01
During the procurement of musculoskeletal grafts contamination may occur. As this might be detrimental for the acceptor, it is important to know which variables influence this occurrence and to alter procurement protocols accordingly. From 2004 to 2012 we gathered information on 6,428 allografts obtained from 291 donors. Using a multiple regression model we attempted to determine the factors that influence the contamination risk during procurement. We used the following variables: cause of death, type of hospital (i.e. university hospital vs. general hospital), previous blood vessel donation, previous organ donation, donor age, time between death and the start of the procurement, duration of the procurement, number of people attending the procurement and the number of procured grafts. The multiple regression model was only able to explain 5 % of the variability of the used outcome variable. None of the variables examined appear to have an important influence on the contamination risk.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
ERIC Educational Resources Information Center
Strang, Kenneth David
2009-01-01
This paper discusses how a seldom-used statistical procedure, recursive regression (RR), can numerically and graphically illustrate data-driven nonlinear relationships and interaction of variables. This routine falls into the family of exploratory techniques, yet a few interesting features make it a valuable compliment to factor analysis and…
ERIC Educational Resources Information Center
Luna, Andrew L.; Brennan, Kelly A.
2009-01-01
This study uses a regression model to determine if a significant difference exists between the actual budget allocation that an academic department received and the model's predicted budget allocation for that same department. Budget data from a Southeastern Master's/Comprehensive state university were used as the dependent variable, and the…
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Forest type mapping of the Interior West
Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard
2004-01-01
This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...
ERIC Educational Resources Information Center
Beauducel, Andre
2007-01-01
It was investigated whether commonly used factor score estimates lead to the same reproduced covariance matrix of observed variables. This was achieved by means of Schonemann and Steiger's (1976) regression component analysis, since it is possible to compute the reproduced covariance matrices of the regression components corresponding to different…
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
NASA Astrophysics Data System (ADS)
Ronsmans, Gaétane; Wespes, Catherine; Hurtmans, Daniel; Clerbaux, Cathy; Coheur, Pierre-François
2018-04-01
This study aims to understand the spatial and temporal variability of HNO3 total columns in terms of explanatory variables. To achieve this, multiple linear regressions are used to fit satellite-derived time series of HNO3 daily averaged total columns. First, an analysis of the IASI 9-year time series (2008-2016) is conducted based on various equivalent latitude bands. The strong and systematic denitrification of the southern polar stratosphere is observed very clearly. It is also possible to distinguish, within the polar vortex, three regions which are differently affected by the denitrification. Three exceptional denitrification episodes in 2011, 2014 and 2016 are also observed in the Northern Hemisphere, due to unusually low arctic temperatures. The time series are then fitted by multivariate regressions to identify what variables are responsible for HNO3 variability in global distributions and time series, and to quantify their respective influence. Out of an ensemble of proxies (annual cycle, solar flux, quasi-biennial oscillation, multivariate ENSO index, Arctic and Antarctic oscillations and volume of polar stratospheric clouds), only the those defined as significant (p value < 0.05) by a selection algorithm are retained for each equivalent latitude band. Overall, the regression gives a good representation of HNO3 variability, with especially good results at high latitudes (60-80 % of the observed variability explained by the model). The regressions show the dominance of annual variability in all latitudinal bands, which is related to specific chemistry and dynamics depending on the latitudes. We find that the polar stratospheric clouds (PSCs) also have a major influence in the polar regions, and that their inclusion in the model improves the correlation coefficients and the residuals. However, there is still a relatively large portion of HNO3 variability that remains unexplained by the model, especially in the intertropical regions, where factors not included in the regression model (such as vegetation fires or lightning) may be at play.
Anxiety, Affect, Self-Esteem, and Stress: Mediation and Moderation Effects on Depression
Nima, Ali Al; Rosenberg, Patricia; Archer, Trevor; Garcia, Danilo
2013-01-01
Background Mediation analysis investigates whether a variable (i.e., mediator) changes in regard to an independent variable, in turn, affecting a dependent variable. Moderation analysis, on the other hand, investigates whether the statistical interaction between independent variables predict a dependent variable. Although this difference between these two types of analysis is explicit in current literature, there is still confusion with regard to the mediating and moderating effects of different variables on depression. The purpose of this study was to assess the mediating and moderating effects of anxiety, stress, positive affect, and negative affect on depression. Methods Two hundred and two university students (males = 93, females = 113) completed questionnaires assessing anxiety, stress, self-esteem, positive and negative affect, and depression. Mediation and moderation analyses were conducted using techniques based on standard multiple regression and hierarchical regression analyses. Main Findings The results indicated that (i) anxiety partially mediated the effects of both stress and self-esteem upon depression, (ii) that stress partially mediated the effects of anxiety and positive affect upon depression, (iii) that stress completely mediated the effects of self-esteem on depression, and (iv) that there was a significant interaction between stress and negative affect, and between positive affect and negative affect upon depression. Conclusion The study highlights different research questions that can be investigated depending on whether researchers decide to use the same variables as mediators and/or moderators. PMID:24039896
Identification of extremely premature infants at high risk of rehospitalization.
Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D
2011-11-01
Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization
Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.
2011-01-01
OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Fossum, Kenneth D.; O'Day, Christie M.; Wilson, Barbara J.; Monical, Jim E.
2001-01-01
Stormwater and streamflow in Maricopa County were monitored to (1) describe the physical, chemical, and toxicity characteristics of stormwater from areas having different land uses, (2) describe the physical, chemical, and toxicity characteristics of streamflow from areas that receive urban stormwater, and (3) estimate constituent loads in stormwater. Urban stormwater and streamflow had similar ranges in most constituent concentrations. The mean concentration of dissolved solids in urban stormwater was lower than in streamflow from the Salt River and Indian Bend Wash. Urban stormwater, however, had a greater chemical oxygen demand and higher concentrations of most nutrients. Mean seasonal loads and mean annual loads of 11 constituents and volumes of runoff were estimated for municipalities in the metropolitan Phoenix area, Arizona, by adjusting regional regression equations of loads. This adjustment procedure uses the original regional regression equation and additional explanatory variables that were not included in the original equation. The adjusted equations had standard errors that ranged from 161 to 196 percent. The large standard errors of the prediction result from the large variability of the constituent concentration data used in the regression analysis. Adjustment procedures produced unsatisfactory results for nine of the regressions?suspended solids, dissolved solids, total phosphorus, dissolved phosphorus, total recoverable cadmium, total recoverable copper, total recoverable lead, total recoverable zinc, and storm runoff. These equations had no consistent direction of bias and no other additional explanatory variables correlated with the observed loads. A stepwise-multiple regression or a three-variable regression (total storm rainfall, drainage area, and impervious area) and local data were used to develop local regression equations for these nine constituents. These equations had standard errors from 15 to 183 percent.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D.; Hood, Darryl B.; Skelton, Tyler
2014-01-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire. PMID:23395953
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
Above-ground biomass of mangrove species. I. Analysis of models
NASA Astrophysics Data System (ADS)
Soares, Mário Luiz Gomes; Schaeffer-Novelli, Yara
2005-10-01
This study analyzes the above-ground biomass of Rhizophora mangle and Laguncularia racemosa located in the mangroves of Bertioga (SP) and Guaratiba (RJ), Southeast Brazil. Its purpose is to determine the best regression model to estimate the total above-ground biomass and compartment (leaves, reproductive parts, twigs, branches, trunk and prop roots) biomass, indirectly. To do this, we used structural measurements such as height, diameter at breast-height (DBH), and crown area. A combination of regression types with several compositions of independent variables generated 2.272 models that were later tested. Subsequent analysis of the models indicated that the biomass of reproductive parts, branches, and prop roots yielded great variability, probably because of environmental factors and seasonality (in the case of reproductive parts). It also indicated the superiority of multiple regression to estimate above-ground biomass as it allows researchers to consider several aspects that affect above-ground biomass, specially the influence of environmental factors. This fact has been attested to the models that estimated the biomass of crown compartments.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Flood-frequency prediction methods for unregulated streams of Tennessee, 2000
Law, George S.; Tasker, Gary D.
2003-01-01
Up-to-date flood-frequency prediction methods for unregulated, ungaged rivers and streams of Tennessee have been developed. Prediction methods include the regional-regression method and the newer region-of-influence method. The prediction methods were developed using stream-gage records from unregulated streams draining basins having from 1 percent to about 30 percent total impervious area. These methods, however, should not be used in heavily developed or storm-sewered basins with impervious areas greater than 10 percent. The methods can be used to estimate 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence-interval floods of most unregulated rural streams in Tennessee. A computer application was developed that automates the calculation of flood frequency for unregulated, ungaged rivers and streams of Tennessee. Regional-regression equations were derived by using both single-variable and multivariable regional-regression analysis. Contributing drainage area is the explanatory variable used in the single-variable equations. Contributing drainage area, main-channel slope, and a climate factor are the explanatory variables used in the multivariable equations. Deleted-residual standard error for the single-variable equations ranged from 32 to 65 percent. Deleted-residual standard error for the multivariable equations ranged from 31 to 63 percent. These equations are included in the computer application to allow easy comparison of results produced by the different methods. The region-of-influence method calculates multivariable regression equations for each ungaged site and recurrence interval using basin characteristics from 60 similar sites selected from the study area. Explanatory variables that may be used in regression equations computed by the region-of-influence method include contributing drainage area, main-channel slope, a climate factor, and a physiographic-region factor. Deleted-residual standard error for the region-of-influence method tended to be only slightly smaller than those for the regional-regression method and ranged from 27 to 62 percent.
Advanced microwave soil moisture studies. [Big Sioux River Basin, Iowa
NASA Technical Reports Server (NTRS)
Dalsted, K. J.; Harlan, J. C.
1983-01-01
Comparisons of low level L-band brightness temperature (TB) and thermal infrared (TIR) data as well as the following data sets: soil map and land cover data; direct soil moisture measurement; and a computer generated contour map were statistically evaluated using regression analysis and linear discriminant analysis. Regression analysis of footprint data shows that statistical groupings of ground variables (soil features and land cover) hold promise for qualitative assessment of soil moisture and for reducing variance within the sampling space. Dry conditions appear to be more conductive to producing meaningful statistics than wet conditions. Regression analysis using field averaged TB and TIR data did not approach the higher sq R values obtained using within-field variations. The linear discriminant analysis indicates some capacity to distinguish categories with the results being somewhat better on a field basis than a footprint basis.
Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall
NASA Astrophysics Data System (ADS)
Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik
2016-02-01
Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.
A use of regression analysis in acoustical diagnostics of gear drives
NASA Technical Reports Server (NTRS)
Balitskiy, F. Y.; Genkin, M. D.; Ivanova, M. A.; Kobrinskiy, A. A.; Sokolova, A. G.
1973-01-01
A study is presented of components of the vibration spectrum as the filtered first and second harmonics of the tooth frequency which permits information to be obtained on the physical characteristics of the vibration excitation process, and an approach to be made to comparison of models of the gearing. Regression analysis of two random processes has shown a strong dependence of the second harmonic on the first, and independence of the first from the second. The nature of change in the regression line, with change in loading moment, gives rise to the idea of a variable phase shift between the first and second harmonics.
ERIC Educational Resources Information Center
Kapes, Jerome T.; And Others
Three models of multiple regression analysis (MRA): single equation, commonality analysis, and path analysis, were applied to longitudinal data from the Pennsylvania Vocational Development Study. Variables influencing weekly income of vocational education students one year after high school graduation were examined: grade point averages (grades…
2015-10-28
techniques such as regression analysis, correlation, and multicollinearity assessment to identify the change and error on the input to the model...between many of the independent or predictor variables, the issue of multicollinearity may arise [18]. VII. SUMMARY Accurate decisions concerning
NASA Astrophysics Data System (ADS)
Solimun
2017-05-01
The aim of this research is to model survival data from kidney-transplant patients using the partial least squares (PLS)-Cox regression, which can both meet and not meet the no-multicollinearity assumption. The secondary data were obtained from research entitled "Factors affecting the survival of kidney-transplant patients". The research subjects comprised 250 patients. The predictor variables consisted of: age (X1), sex (X2); two categories, prior hemodialysis duration (X3), diabetes (X4); two categories, prior transplantation number (X5), number of blood transfusions (X6), discrepancy score (X7), use of antilymphocyte globulin(ALG) (X8); two categories, while the response variable was patient survival time (in months). Partial least squares regression is a model that connects the predictor variables X and the response variable y and it initially aims to determine the relationship between them. Results of the above analyses suggest that the survival of kidney transplant recipients ranged from 0 to 55 months, with 62% of the patients surviving until they received treatment that lasted for 55 months. The PLS-Cox regression analysis results revealed that patients' age and the use of ALG significantly affected the survival time of patients. The factor of patients' age (X1) in the PLS-Cox regression model merely affected the failure probability by 1.201. This indicates that the probability of dying for elderly patients with a kidney transplant is 1.152 times higher than that for younger patients.
Modelling space of spread Dengue Hemorrhagic Fever (DHF) in Central Java use spatial durbin model
NASA Astrophysics Data System (ADS)
Ispriyanti, Dwi; Prahutama, Alan; Taryono, Arkadina PN
2018-05-01
Dengue Hemorrhagic Fever is one of the major public health problems in Indonesia. From year to year, DHF causes Extraordinary Event in most parts of Indonesia, especially Central Java. Central Java consists of 35 districts or cities where each region is close to each other. Spatial regression is an analysis that suspects the influence of independent variables on the dependent variables with the influences of the region inside. In spatial regression modeling, there are spatial autoregressive model (SAR), spatial error model (SEM) and spatial autoregressive moving average (SARMA). Spatial Durbin model is the development of SAR where the dependent and independent variable have spatial influence. In this research dependent variable used is number of DHF sufferers. The independent variables observed are population density, number of hospitals, residents and health centers, and mean years of schooling. From the multiple regression model test, the variables that significantly affect the spread of DHF disease are the population and mean years of schooling. By using queen contiguity and rook contiguity, the best model produced is the SDM model with queen contiguity because it has the smallest AIC value of 494,12. Factors that generally affect the spread of DHF in Central Java Province are the number of population and the average length of school.
ERIC Educational Resources Information Center
Bozpolat, Ebru
2017-01-01
The purpose of this study is to determine whether Cumhuriyet University Faculty of Education students' levels of speaking anxiety are predicted by the variables of gender, department, grade, such sub-dimensions of "Speaking Self-Efficacy Scale for Pre-Service Teachers" as "public speaking," "effective speaking,"…
Impact of Preadmission Variables on USMLE Step 1 and Step 2 Performance
ERIC Educational Resources Information Center
Kleshinski, James; Khuder, Sadik A.; Shapiro, Joseph I.; Gold, Jeffrey P.
2009-01-01
Purpose: To examine the predictive ability of preadmission variables on United States Medical Licensing Examinations (USMLE) step 1 and step 2 performance, incorporating the use of a neural network model. Method: Preadmission data were collected on matriculants from 1998 to 2004. Linear regression analysis was first used to identify predictors of…
Early Career Determinants of Research Productivity
ERIC Educational Resources Information Center
Clemente, Frank
1973-01-01
The publication records of 2,205 holders of the Ph.D. in sociology are examined for the period 1940-70. The predictive efficiency of six independent variables is assessed via regression analysis. A seventh variable is used as a control. Results of the study and directions for future research are presented and discussed. (SM)
Holtz, Carol; Sowell, Richard; VanBrackle, Lewis; Velasquez, Gabriela; Hernandez-Alonso, Virginia
2014-01-01
This quantitative study explored the level of Quality of Life (QoL) in indigenous Mexican women and identified psychosocial factors that significantly influenced their QoL, using face-to-face interviews with 101 women accessing care in an HIV clinic in Oaxaca, Mexico. Variables included demographic characteristics, levels of depression, coping style, family functioning, HIV-related beliefs, and QoL. Descriptive statistics were used to analyze participant characteristics, and women's scores on data collection instruments. Pearson's R correlational statistics were used to determine the level of significance between study variables. Multiple regression analysis examined all variables that were significantly related to QoL. Pearson's correlational analysis of relationships between Spirituality, Educating Self about HIV, Family Functioning, Emotional Support, Physical Care, and Staying Positive demonstrated positive correlation to QoL. Stigma, depression, and avoidance coping were significantly and negatively associated with QoL. The final regression model indicated that depression and avoidance coping were the best predictor variables for QoL. Copyright © 2014 Association of Nurses in AIDS Care. Published by Elsevier Inc. All rights reserved.
Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok
2013-02-01
The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.
Atmospheric mold spore counts in relation to meteorological parameters
NASA Astrophysics Data System (ADS)
Katial, R. K.; Zhang, Yiming; Jones, Richard H.; Dyer, Philip D.
Fungal spore counts of Cladosporium, Alternaria, and Epicoccum were studied during 8 years in Denver, Colorado. Fungal spore counts were obtained daily during the pollinating season by a Rotorod sampler. Weather data were obtained from the National Climatic Data Center. Daily averages of temperature, relative humidity, daily precipitation, barometric pressure, and wind speed were studied. A time series analysis was performed on the data to mathematically model the spore counts in relation to weather parameters. Using SAS PROC ARIMA software, a regression analysis was performed, regressing the spore counts on the weather variables assuming an autoregressive moving average (ARMA) error structure. Cladosporium was found to be positively correlated (P<0.02) with average daily temperature, relative humidity, and negatively correlated with precipitation. Alternaria and Epicoccum did not show increased predictability with weather variables. A mathematical model was derived for Cladosporium spore counts using the annual seasonal cycle and significant weather variables. The model for Alternaria and Epicoccum incorporated the annual seasonal cycle. Fungal spore counts can be modeled by time series analysis and related to meteorological parameters controlling for seasonallity; this modeling can provide estimates of exposure to fungal aeroallergens.
Cheng, Jian; Xu, Zhiwei; Bambrick, Hilary; Su, Hong; Tong, Shilu; Hu, Wenbiao
2017-12-01
Unstable weather, such as intra- and inter-day temperature variability, can impair the health and shorten the survival time of population around the world. Climate change will cause Earth's surface temperature rise, but has unclear effects on temperature variability, making it urgent to understand the characteristics of the burden of temperature variability on mortality, regionally and nationally. This paper aims to quantify the mortality risk of exposure to short-term temperature variability, estimate the resulting death toll and explore how the strength of temperature variability effects will vary as a function of city-level characteristics. Ten-year (2000-2009) time-series data on temperature and mortality were collected for five largest Australia's cities (Sydney, Melbourne, Brisbane, Perth and Adelaide), collectively registering 708,751 deaths in different climates. Short-term temperature variability was captured and represented as the hourly temperature standard deviation within two days. Three-stage analyses were used to assess the burden of temperature variability on mortality. First, we modelled temperature variability-mortality relation and estimated the relative risk of death for each city, using a time-series quasi-Poisson regression model. Second, we used meta-analysis to pool the city-specific estimates, and meta-regression to explore if some city-level factors will modify the population vulnerability to temperature variability. Finally, we calculated the city-specific deaths attributable to temperature variability, and applied such estimates to the whole of Australia as a reflection of the nation-wide death burden associated with temperature variability. We found evidence of significant associations between temperature variability and mortality in all cities assessed. Deaths associated with each 1°C rise in temperature variability elevated by 0.28% (95% confidence interval (CI): 0.05%, 0.52%) in Melbourne to 1.00% (95%CI: 0.52%, 1.48%) in Brisbane, with a pooled estimate of 0.51% (95%CI: 0.33%, 0.69%) for Australia. Subtropical and temperate regions showed no apparent difference in temperature variability impacts. Meta-regression analyses indicated that the mortality risk could be influenced by city-specific factors: latitude, mean temperature, population density and the prevalence of several chronic diseases. Taking account of contributions from the entire time-series, temperature variability was estimated to account for 0.99% to 3.24% of deaths across cities, with a nation-wide attributable fraction of 1.67% (9.59 deaths per 100, 000 population per year). Hourly temperature variability may be an important risk factor of weather-related deaths and led to a sizeable mortality burden. This study underscores the need for developing specific and effective interventions in Australia to lessen the health consequences of temperature variability. Copyright © 2017. Published by Elsevier Ltd.
Analysis of the Effects of the Commander’s Battle Positioning on Unit Combat Performance
1991-03-01
Analysis ......... .. 58 Logistic Regression Analysis ......... .. 61 Canonical Correlation Analysis ........ .. 62 Descriminant Analysis...entails classifying objects into two or more distinct groups, or responses. Dillon defines descriminant analysis as "deriving linear combinations of the...object given it’s predictor variables. The second objective is, through analysis of the parameters of the descriminant functions, determine those
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Braeton J.; Shaneyfelt, Calvin R.
A NISAC study on the economic effects of a hypothetical H1N1 pandemic was done in order to assess the differential impacts at the state and industry levels given changes in absenteeism, mortality, and consumer spending rates. Part of the analysis was to determine if there were any direct relationships between pandemic impacts and gross domestic product (GDP) losses. Multiple regression analysis was used because it shows very clearly which predictors are significant in their impact on GDP. GDP impact data taken from the REMI PI+ (Regional Economic Models, Inc., Policy Insight +) model was used to serve as the responsemore » variable. NISAC economists selected the average absenteeism rate, mortality rate, and consumer spending categories as the predictor variables. Two outliers were found in the data: Nevada and Washington, DC. The analysis was done twice, with the outliers removed for the second analysis. The second set of regressions yielded a cleaner model, but for the purposes of this study, the analysts deemed it not as useful because particular interest was placed on determining the differential impacts to states. Hospitals and accommodation were found to be the most important predictors of percentage change in GDP among the consumer spending variables.« less
Laboratory test variables useful for distinguishing upper from lower gastrointestinal bleeding.
Tomizawa, Minoru; Shinozaki, Fuminobu; Hasegawa, Rumiko; Shirai, Yoshinori; Motoyoshi, Yasufumi; Sugiyama, Takao; Yamamoto, Shigenori; Ishige, Naoki
2015-05-28
To distinguish upper from lower gastrointestinal (GI) bleeding. Patient records between April 2011 and March 2014 were analyzed retrospectively (3296 upper endoscopy, and 1520 colonoscopy). Seventy-six patients had upper GI bleeding (Upper group) and 65 had lower GI bleeding (Lower group). Variables were compared between the groups using one-way analysis of variance. Logistic regression was performed to identify variables significantly associated with the diagnosis of upper vs lower GI bleeding. Receiver-operator characteristic (ROC) analysis was performed to determine the threshold value that could distinguish upper from lower GI bleeding. Hemoglobin (P = 0.023), total protein (P = 0.0002), and lactate dehydrogenase (P = 0.009) were significantly lower in the Upper group than in the Lower group. Blood urea nitrogen (BUN) was higher in the Upper group than in the Lower group (P = 0.0065). Logistic regression analysis revealed that BUN was most strongly associated with the diagnosis of upper vs lower GI bleeding. ROC analysis revealed a threshold BUN value of 21.0 mg/dL, with a specificity of 93.0%. The threshold BUN value for distinguishing upper from lower GI bleeding was 21.0 mg/dL.
Laboratory test variables useful for distinguishing upper from lower gastrointestinal bleeding
Tomizawa, Minoru; Shinozaki, Fuminobu; Hasegawa, Rumiko; Shirai, Yoshinori; Motoyoshi, Yasufumi; Sugiyama, Takao; Yamamoto, Shigenori; Ishige, Naoki
2015-01-01
AIM: To distinguish upper from lower gastrointestinal (GI) bleeding. METHODS: Patient records between April 2011 and March 2014 were analyzed retrospectively (3296 upper endoscopy, and 1520 colonoscopy). Seventy-six patients had upper GI bleeding (Upper group) and 65 had lower GI bleeding (Lower group). Variables were compared between the groups using one-way analysis of variance. Logistic regression was performed to identify variables significantly associated with the diagnosis of upper vs lower GI bleeding. Receiver-operator characteristic (ROC) analysis was performed to determine the threshold value that could distinguish upper from lower GI bleeding. RESULTS: Hemoglobin (P = 0.023), total protein (P = 0.0002), and lactate dehydrogenase (P = 0.009) were significantly lower in the Upper group than in the Lower group. Blood urea nitrogen (BUN) was higher in the Upper group than in the Lower group (P = 0.0065). Logistic regression analysis revealed that BUN was most strongly associated with the diagnosis of upper vs lower GI bleeding. ROC analysis revealed a threshold BUN value of 21.0 mg/dL, with a specificity of 93.0%. CONCLUSION: The threshold BUN value for distinguishing upper from lower GI bleeding was 21.0 mg/dL. PMID:26034359
Yoo, Kyung Hee
2007-06-01
This study was conducted to investigate the correlation among uncertainty, mastery and appraisal of uncertainty in hospitalized children's mothers. Self report questionnaires were used to measure the variables. Variables were uncertainty, mastery and appraisal of uncertainty. In data analysis, the SPSSWIN 12.0 program was utilized for descriptive statistics, Pearson's correlation coefficients, and regression analysis. Reliability of the instruments was cronbach's alpha=.84~.94. Mastery negatively correlated with uncertainty(r=-.444, p=.000) and danger appraisal of uncertainty(r=-.514, p=.000). In regression of danger appraisal of uncertainty, uncertainty and mastery were significant predictors explaining 39.9%. Mastery was a significant mediating factor between uncertainty and danger appraisal of uncertainty in hospitalized children's mothers. Therefore, nursing interventions which improve mastery must be developed for hospitalized children's mothers.
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
ERIC Educational Resources Information Center
Usta, H. Gonca
2016-01-01
This study aims to analyze the student and school level variables that affect students' self-efficacy levels in mathematics in China-Shanghai, Turkey, and Greece based on PISA 2012 results. In line with this purpose, the hierarchical linear regression model (HLM) was employed. The interschool variability is estimated at approximately 17% in…
2006-03-01
identify if an explanatory variable may have been omitted due to model misspecification ( Ramsey , 1979). The RESET test resulted in failure to...Prob > F 0.0094 This model was also regressed using Huber-White estimators. Again, the Ramsey RESET test was done to ensure relevant...Aircraft. Annapolis, MD: Naval Institute Press, 2004. Ramsey , J. B. “ Tests for Specification Errors in Classical Least-Squares Regression Analysis
Family context and externalizing correlates of childhood animal cruelty in adjudicated delinquents.
Walters, Glenn D; Noon, Alexandria
2015-05-01
The purpose of this study was to determine whether childhood animal cruelty is primarily a feature of family context or of externalizing behavior. Twenty measures of family context and proactive (fearlessness) and reactive (disinhibition) externalizing behavior were correlated with the retrospective accounts of childhood animal cruelty provided by 1,354 adjudicated delinquents. A cross-sectional analysis revealed that all 20 family context, proactive externalizing, and reactive externalizing variables correlated significantly with animal cruelty. Prospective analyses showed that when the animal cruelty variable was included in a regression equation with the 10 family context variables (parental arguing and fighting, parental drug use, parental hostility, and parental knowledge and monitoring of offspring behavior) or in a regression equation with the five reactive externalizing variables (interpersonal hostility, secondary psychopathy, weak impulse control, weak suppression of aggression, and short time horizon), it continued to predict future violent and income (property + drug) offending. The animal cruelty variable no longer predicted offending, however, when included in a regression equation with the five proactive externalizing variables (early onset behavioral problems, primary psychopathy, moral disengagement, positive outcome expectancies for crime, and lack of consideration for others). These findings suggest that while animal cruelty correlates with a wide range of family context and externalizing variables, it may serve as a marker of violent and nonviolent offending by virtue of its position on the proactive subdimension of the externalizing spectrum. © The Author(s) 2014.
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Time series regression studies in environmental epidemiology.
Bhaskaran, Krishnan; Gasparrini, Antonio; Hajat, Shakoor; Smeeth, Liam; Armstrong, Ben
2013-08-01
Time series regression studies have been widely used in environmental epidemiology, notably in investigating the short-term associations between exposures such as air pollution, weather variables or pollen, and health outcomes such as mortality, myocardial infarction or disease-specific hospital admissions. Typically, for both exposure and outcome, data are available at regular time intervals (e.g. daily pollution levels and daily mortality counts) and the aim is to explore short-term associations between them. In this article, we describe the general features of time series data, and we outline the analysis process, beginning with descriptive analysis, then focusing on issues in time series regression that differ from other regression methods: modelling short-term fluctuations in the presence of seasonal and long-term patterns, dealing with time varying confounding factors and modelling delayed ('lagged') associations between exposure and outcome. We finish with advice on model checking and sensitivity analysis, and some common extensions to the basic model.
Wheeler, David C; Czarnota, Jenna; Jones, Resa M
2017-01-01
Socioeconomic status (SES) is often considered a risk factor for health outcomes. SES is typically measured using individual variables of educational attainment, income, housing, and employment variables or a composite of these variables. Approaches to building the composite variable include using equal weights for each variable or estimating the weights with principal components analysis or factor analysis. However, these methods do not consider the relationship between the outcome and the SES variables when constructing the index. In this project, we used weighted quantile sum (WQS) regression to estimate an area-level SES index and its effect in a model of colonoscopy screening adherence in the Minnesota-Wisconsin Metropolitan Statistical Area. We considered several specifications of the SES index including using different spatial scales (e.g., census block group-level, tract-level) for the SES variables. We found a significant positive association (odds ratio = 1.17, 95% CI: 1.15-1.19) between the SES index and colonoscopy adherence in the best fitting model. The model with the best goodness-of-fit included a multi-scale SES index with 10 variables at the block group-level and one at the tract-level, with home ownership, race, and income among the most important variables. Contrary to previous index construction, our results were not consistent with an assumption of equal importance of variables in the SES index when explaining colonoscopy screening adherence. Our approach is applicable in any study where an SES index is considered as a variable in a regression model and the weights for the SES variables are not known in advance.
A psycholinguistic database for traditional Chinese character naming.
Chang, Ya-Ning; Hsu, Chun-Hsien; Tsai, Jie-Li; Chen, Chien-Liang; Lee, Chia-Ying
2016-03-01
In this study, we aimed to provide a large-scale set of psycholinguistic norms for 3,314 traditional Chinese characters, along with their naming reaction times (RTs), collected from 140 Chinese speakers. The lexical and semantic variables in the database include frequency, regularity, familiarity, consistency, number of strokes, homophone density, semantic ambiguity rating, phonetic combinability, semantic combinability, and the number of disyllabic compound words formed by a character. Multiple regression analyses were conducted to examine the predictive powers of these variables for the naming RTs. The results demonstrated that these variables could account for a significant portion of variance (55.8%) in the naming RTs. An additional multiple regression analysis was conducted to demonstrate the effects of consistency and character frequency. Overall, the regression results were consistent with the findings of previous studies on Chinese character naming. This database should be useful for research into Chinese language processing, Chinese education, or cross-linguistic comparisons. The database can be accessed via an online inquiry system (http://ball.ling.sinica.edu.tw/namingdatabase/index.html).
Meel-van den Abeelen, Aisha S.S.; Simpson, David M.; Wang, Lotte J.Y.; Slump, Cornelis H.; Zhang, Rong; Tarumi, Takashi; Rickards, Caroline A.; Payne, Stephen; Mitsis, Georgios D.; Kostoglou, Kyriaki; Marmarelis, Vasilis; Shin, Dae; Tzeng, Yu-Chieh; Ainslie, Philip N.; Gommer, Erik; Müller, Martin; Dorado, Alexander C.; Smielewski, Peter; Yelicich, Bernardo; Puppo, Corina; Liu, Xiuyun; Czosnyka, Marek; Wang, Cheng-Yen; Novak, Vera; Panerai, Ronney B.; Claassen, Jurgen A.H.R.
2014-01-01
Transfer function analysis (TFA) is a frequently used method to assess dynamic cerebral autoregulation (CA) using spontaneous oscillations in blood pressure (BP) and cerebral blood flow velocity (CBFV). However, controversies and variations exist in how research groups utilise TFA, causing high variability in interpretation. The objective of this study was to evaluate between-centre variability in TFA outcome metrics. 15 centres analysed the same 70 BP and CBFV datasets from healthy subjects (n = 50 rest; n = 20 during hypercapnia); 10 additional datasets were computer-generated. Each centre used their in-house TFA methods; however, certain parameters were specified to reduce a priori between-centre variability. Hypercapnia was used to assess discriminatory performance and synthetic data to evaluate effects of parameter settings. Results were analysed using the Mann–Whitney test and logistic regression. A large non-homogeneous variation was found in TFA outcome metrics between the centres. Logistic regression demonstrated that 11 centres were able to distinguish between normal and impaired CA with an AUC > 0.85. Further analysis identified TFA settings that are associated with large variation in outcome measures. These results indicate the need for standardisation of TFA settings in order to reduce between-centre variability and to allow accurate comparison between studies. Suggestions on optimal signal processing methods are proposed. PMID:24725709
NASA Astrophysics Data System (ADS)
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for maximal response. For the calculation of the regression coefficients, dispersion and correlation coefficients, the software Matlab was used.
NASA Astrophysics Data System (ADS)
Fouad, Geoffrey; Skupin, André; Hope, Allen
2016-04-01
The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions. These results are largely reflective of cross-correlation existing in hydrologic datasets, and highlight the limited predictive power of many traditionally used variables for regional regression. A parsimonious approach including fewer variables chosen based on their connection to streamflow may be more efficient than a data mining approach including many different variables. Future regional regression studies may benefit from having a hydrologic rationale for including different variables and attempting to create new variables related to streamflow.
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis
Gong, Xiajing; Hu, Meng
2018-01-01
Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640
Automatic identification of variables in epidemiological datasets using logic regression.
Lorenz, Matthias W; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L; Kiechl, Stefan; Orth, Andreas
2017-04-13
For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.
Neural Networks for Readability Analysis.
ERIC Educational Resources Information Center
McEneaney, John E.
This paper describes and reports on the performance of six related artificial neural networks that have been developed for the purpose of readability analysis. Two networks employ counts of linguistic variables that simulate a traditional regression-based approach to readability. The remaining networks determine readability from "visual…
Variability in CKD stage in outpatients followed in two large renal clinics.
Sikaneta, Tabo; Abdolell, Mohamed; Taskapan, Hulya; Roscoe, Janet; Fung, Jason; Nagai, Gordon; Ting, Robert H; Ng, Paul; Wu, George; Oreopoulos, Dimitrios; Tam, Paul Y
2012-10-01
Chronic kidney disease (CKD) is staged by glomerular filtration rate (GFR). CKD stages sometimes vary between routine office visits, and it is unknown if this impacts renal and patient survival separately from a cross-sectional CKD stage value. We quantified and categorized CKD stage variability in a large group of outpatients and correlated this with clinical and demographic features and with renal and patient survival. All estimated GFRs were staged in the first observation period. CKD stages were then categorized as static, improving, worsening, or fluctuating. Logistic regression analysis was performed to identify clinical variables associated with CKD stage variability. Death and dialysis progression rates were then collected and analyzed using Cox proportional regression. During a 1.1-year observation period, 1,262 patients (mean age 71.25 years) had a mean 5 eGFR's. CKD stages were static in 60.4%, worsened in 14.4%, improved in 7.4%, and fluctuated in 17.2% of patients. Secondary analysis revealed heavy proteinuria and East Asian ethnicity to be negatively, and diabetes mellitus and previous acute kidney injury to be positively associated with improving CKD stages. Cox proportional regression of 902 patients analyzed 2.3 years later revealed a negative association with improving CKD stage and subsequent need for dialysis. CKD stage changed in 40% of 1,262 elderly patients when determined 5 times in just over 1 year. Improving CKD stage was the only variability pattern significantly associated with any of the clinical outcomes when assessed 2.3 years later, being unlikely to be linked with subsequent need for dialysis.
The use of generalized estimating equations in the analysis of motor vehicle crash data.
Hutchings, Caroline B; Knight, Stacey; Reading, James C
2003-01-01
The purpose of this study was to determine if it is necessary to use generalized estimating equations (GEEs) in the analysis of seat belt effectiveness in preventing injuries in motor vehicle crashes. The 1992 Utah crash dataset was used, excluding crash participants where seat belt use was not appropriate (n=93,633). The model used in the 1996 Report to Congress [Report to congress on benefits of safety belts and motorcycle helmets, based on data from the Crash Outcome Data Evaluation System (CODES). National Center for Statistics and Analysis, NHTSA, Washington, DC, February 1996] was analyzed for all occupants with logistic regression, one level of nesting (occupants within crashes), and two levels of nesting (occupants within vehicles within crashes) to compare the use of GEEs with logistic regression. When using one level of nesting compared to logistic regression, 13 of 16 variance estimates changed more than 10%, and eight of 16 parameter estimates changed more than 10%. In addition, three of the independent variables changed from significant to insignificant (alpha=0.05). With the use of two levels of nesting, two of 16 variance estimates and three of 16 parameter estimates changed more than 10% from the variance and parameter estimates in one level of nesting. One of the independent variables changed from insignificant to significant (alpha=0.05) in the two levels of nesting model; therefore, only two of the independent variables changed from significant to insignificant when the logistic regression model was compared to the two levels of nesting model. The odds ratio of seat belt effectiveness in preventing injuries was 12% lower when a one-level nested model was used. Based on these results, we stress the need to use a nested model and GEEs when analyzing motor vehicle crash data.
Application of Temperature Sensitivities During Iterative Strain-Gage Balance Calibration Analysis
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2011-01-01
A new method is discussed that may be used to correct wind tunnel strain-gage balance load predictions for the influence of residual temperature effects at the location of the strain-gages. The method was designed for the iterative analysis technique that is used in the aerospace testing community to predict balance loads from strain-gage outputs during a wind tunnel test. The new method implicitly applies temperature corrections to the gage outputs during the load iteration process. Therefore, it can use uncorrected gage outputs directly as input for the load calculations. The new method is applied in several steps. First, balance calibration data is analyzed in the usual manner assuming that the balance temperature was kept constant during the calibration. Then, the temperature difference relative to the calibration temperature is introduced as a new independent variable for each strain--gage output. Therefore, sensors must exist near the strain--gages so that the required temperature differences can be measured during the wind tunnel test. In addition, the format of the regression coefficient matrix needs to be extended so that it can support the new independent variables. In the next step, the extended regression coefficient matrix of the original calibration data is modified by using the manufacturer specified temperature sensitivity of each strain--gage as the regression coefficient of the corresponding temperature difference variable. Finally, the modified regression coefficient matrix is converted to a data reduction matrix that the iterative analysis technique needs for the calculation of balance loads. Original calibration data and modified check load data of NASA's MC60D balance are used to illustrate the new method.
NASA Astrophysics Data System (ADS)
Wang, Yan-Jun; Liu, Qun
1999-03-01
Analysis of stock-recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD-based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.
Do Our Means of Inquiry Match our Intentions?
Petscher, Yaacov
2016-01-01
A key stage of the scientific method is the analysis of data, yet despite the variety of methods that are available to researchers they are most frequently distilled to a model that focuses on the average relation between variables. Although research questions are frequently conceived with broad inquiry in mind, most regression methods are limited in comprehensively evaluating how observed behaviors are related to each other. Quantile regression is a largely unknown yet well-suited analytic technique similar to traditional regression analysis, but allows for a more systematic approach to understanding complex associations among observed phenomena in the psychological sciences. Data from the National Education Longitudinal Study of 1988/2000 are used to illustrate how quantile regression overcomes the limitations of average associations in linear regression by showing that psychological well-being and sex each differentially relate to reading achievement depending on one’s level of reading achievement. PMID:27486410
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Volden, T.
2018-01-01
Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
Nursing home cost and ownership type: evidence of interaction effects.
Arling, G; Nordquist, R H; Capitman, J A
1987-06-01
Due to steadily increasing public expenditures for nursing home care, much research has focused on factors that influence nursing home costs, especially for Medicaid patients. Nursing home cost function studies have typically used a number of predictor variables in a multiple regression analysis to determine the effect of these variables on operating cost. Although several authors have suggested that nursing home ownership types have different goal orientations, not necessarily based on economic factors, little attention has been paid to this issue in empirical research. In this study, data from 150 Virginia nursing homes were used in multiple regression analysis to examine factors accounting for nursing home operating costs. The context of the study was the Virginia Medicaid reimbursement system, which has intermediate care and skilled nursing facility (ICF and SNF) facility-specific per diem rates, set according to facility cost histories. The analysis revealed interaction effects between ownership and other predictor variables (e.g., percentage Medicaid residents, case mix, and region), with predictor variables having different effects on cost depending on ownership type. Conclusions are drawn about the goal orientations and behavior of chain-operated, individual for-profit, and public and nonprofit facilities. The implications of these findings for long-term care reimbursement policies are discussed.
Nursing home cost and ownership type: evidence of interaction effects.
Arling, G; Nordquist, R H; Capitman, J A
1987-01-01
Due to steadily increasing public expenditures for nursing home care, much research has focused on factors that influence nursing home costs, especially for Medicaid patients. Nursing home cost function studies have typically used a number of predictor variables in a multiple regression analysis to determine the effect of these variables on operating cost. Although several authors have suggested that nursing home ownership types have different goal orientations, not necessarily based on economic factors, little attention has been paid to this issue in empirical research. In this study, data from 150 Virginia nursing homes were used in multiple regression analysis to examine factors accounting for nursing home operating costs. The context of the study was the Virginia Medicaid reimbursement system, which has intermediate care and skilled nursing facility (ICF and SNF) facility-specific per diem rates, set according to facility cost histories. The analysis revealed interaction effects between ownership and other predictor variables (e.g., percentage Medicaid residents, case mix, and region), with predictor variables having different effects on cost depending on ownership type. Conclusions are drawn about the goal orientations and behavior of chain-operated, individual for-profit, and public and nonprofit facilities. The implications of these findings for long-term care reimbursement policies are discussed. PMID:3301746
Elovainio, Marko; Heponiemi, Tarja; Kuusio, Hannamaria; Jokela, Markus; Aalto, Anna-Mari; Pekkarinen, Laura; Noro, Anja; Finne-Soveri, Harriet; Kivimäki, Mika; Sinervo, Timo
2015-02-01
The association between psychosocial work environment and employee wellbeing has repeatedly been shown. However, as environmental evaluations have typically been self-reported, the observed associations may be attributable to reporting bias. Applying instrumental-variable regression, we used staffing level (the ratio of staff to residents) as an unconfounded instrument for self-reported job demands and job strain to predict various indicators of wellbeing (perceived stress, psychological distress and sleeping problems) among 1525 registered nurses, practical nurses and nursing assistants working in elderly care wards. In ordinary regression, higher self-reported job demands and job strain were associated with increased risk of perceived stress, psychological distress and sleeping problems. The effect estimates for the associations of these psychosocial factors with perceived stress and psychological distress were greater, but less precisely estimated, in an instrumental-variables analysis which took into account only the variation in self-reported job demands and job strain that was explained by staffing level. No association between psychosocial factors and sleeping problems was observed with the instrumental-variable analysis. These results support a causal interpretation of high self-reported job demands and job strain being risk factors for employee wellbeing. © The Author 2014. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
Distribution of cavity trees in midwestern old-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Distribution of cavity trees in midwesternold-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
ERIC Educational Resources Information Center
Guerra, Jorge
2012-01-01
The purpose of this research was to examine the relationship between teaching readiness and teaching excellence with three variables of preparedness of adjunct professors teaching career technical education courses through student surveys using a correlational design of two statistical techniques; least-squares regression and one-way analysis of…
Rainfall and streamflow from small tree-covered and fern-covered and burned watersheds in Hawaii
H. W. Anderson; P. D. Duffy; Teruo Yamamoto
1966-01-01
Streamflow from two 30-acre watersheds near Honolulu was studied by using principal components regression analysis. Models using data on monthly, storm, and peak discharges were tested against several variables expressing amount and intensity of rainfall, and against variables expressing antecedent rainfall. Explained variation ranged from 78 to 94 percent. The...
Finding and Developing Moderators and Directional Keys by Regression Analysis.
ERIC Educational Resources Information Center
Kokosh, John
A procedure for rapid screening of variables as potential moderators is presented and discussed. A moderator is defined as any variable which can be used to identify differentially predictable persons; or defined statistically by stating that if a predictor and a moderator are each divided into three or more categories and used as independent…
Regression and Geostatistical Techniques: Considerations and Observations from Experiences in NE-FIA
Rachel Riemann; Andrew Lister
2005-01-01
Maps of forest variables improve our understanding of the forest resource by allowing us to view and analyze it spatially. The USDA Forest Service's Northeastern Forest Inventory and Analysis unit (NE-FIA) has used geostatistical techniques, particularly stochastic simulation, to produce maps and spatial data sets of FIA variables. That work underscores the...
Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang
2018-05-01
Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-06-14
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of Knowledge(SM) databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably.
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-01-01
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of KnowledgeSM databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably. PMID:27314369
Contributions of sociodemographic factors to criminal behavior
Mundia, Lawrence; Matzin, Rohani; Mahalle, Salwa; Hamid, Malai Hayati; Osman, Ratna Suriani
2016-01-01
We explored the extent to which prisoner sociodemographic variables (age, education, marital status, employment, and whether their parents were married or not) influenced offending in 64 randomly selected Brunei inmates, comprising both sexes. A quantitative field survey design ideal for the type of participants used in a prison context was employed to investigate the problem. Hierarchical multiple regression analysis with backward elimination identified prisoner marital status and age groups as significantly related to offending. Furthermore, hierarchical multinomial logistic regression analysis with backward elimination indicated that prisoners’ age, primary level education, marital status, employment status, and parental marital status as significantly related to stealing offenses with high odds ratios. All 29 nonrecidivists were false negatives and predicted to reoffend upon release. Similarly, all 33 recidivists were projected to reoffend after release. Hierarchical binary logistic regression analysis revealed age groups (24–29 years and 30–35 years), employed prisoner, and primary level education as variables with high likelihood trends for reoffending. The results suggested that prisoner interventions (educational, counseling, and psychotherapy) in Brunei should treat not only antisocial personality, psychopathy, and mental health problems but also sociodemographic factors. The study generated offending patterns, trends, and norms that may inform subsequent investigations on Brunei prisoners. PMID:27382342
Henry, Stephen G.; Jerant, Anthony; Iosif, Ana-Maria; Feldman, Mitchell D.; Cipri, Camille; Kravitz, Richard L.
2015-01-01
Objective To identify factors associated with participant consent to record visits; to estimate effects of recording on patient-clinician interactions Methods Secondary analysis of data from a randomized trial studying communication about depression; participants were asked for optional consent to audio record study visits. Multiple logistic regression was used to model likelihood of patient and clinician consent. Multivariable regression and propensity score analyses were used to estimate effects of audio recording on 6 dependent variables: discussion of depressive symptoms, preventive health, and depression diagnosis; depression treatment recommendations; visit length; visit difficulty. Results Of 867 visits involving 135 primary care clinicians, 39% were recorded. For clinicians, only working in academic settings (P=0.003) and having worked longer at their current practice (P=0.02) were associated with increased likelihood of consent. For patients, white race (P=0.002) and diabetes (P=0.03) were associated with increased likelihood of consent. Neither multivariable regression nor propensity score analyses revealed any significant effects of recording on the variables examined. Conclusion Few clinician or patient characteristics were significantly associated with consent. Audio recording had no significant effect on any dependent variables. Practice Implications Benefits of recording clinic visits likely outweigh the risks of bias in this setting. PMID:25837372
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Exhaustive Search for Sparse Variable Selection in Linear Regression
NASA Astrophysics Data System (ADS)
Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato
2018-04-01
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Ohlmacher, G.C.; Davis, J.C.
2003-01-01
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron
2017-02-01
Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956
Byun, Bo-Ram; Kim, Yong-Il; Maki, Koutaro; Son, Woo-Sung
2015-01-01
This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6–18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R 2 had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status. PMID:25878721
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].
Vanegas, Jairo; Vásquez, Fabián
Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Worku, Yohannes; Muchie, Mammo
2012-01-01
Objective. The objective was to investigate factors that affect the efficient management of solid waste produced by commercial businesses operating in the city of Pretoria, South Africa. Methods. Data was gathered from 1,034 businesses. Efficiency in solid waste management was assessed by using a structural time-based model designed for evaluating efficiency as a function of the length of time required to manage waste. Data analysis was performed using statistical procedures such as frequency tables, Pearson's chi-square tests of association, and binary logistic regression analysis. Odds ratios estimated from logistic regression analysis were used for identifying key factors that affect efficiency in the proper disposal of waste. Results. The study showed that 857 of the 1,034 businesses selected for the study (83%) were found to be efficient enough with regards to the proper collection and disposal of solid waste. Based on odds ratios estimated from binary logistic regression analysis, efficiency in the proper management of solid waste was significantly influenced by 4 predictor variables. These 4 influential predictor variables are lack of adherence to waste management regulations, wrong perception, failure to provide customers with enough trash cans, and operation of businesses by employed managers, in a decreasing order of importance. PMID:23209483
Results of Database Studies in Spine Surgery Can Be Influenced by Missing Data.
Basques, Bryce A; McLynn, Ryan P; Fice, Michael P; Samuel, Andre M; Lukasiewicz, Adam M; Bohl, Daniel D; Ahn, Junyoung; Singh, Kern; Grauer, Jonathan N
2017-12-01
National databases are increasingly being used for research in spine surgery; however, one limitation of such databases that has received sparse mention is the frequency of missing data. Studies using these databases often do not emphasize the percentage of missing data for each variable used and do not specify how patients with missing data are incorporated into analyses. This study uses the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database to examine whether different treatments of missing data can influence the results of spine studies. (1) What is the frequency of missing data fields for demographics, medical comorbidities, preoperative laboratory values, operating room times, and length of stay recorded in ACS-NSQIP? (2) Using three common approaches to handling missing data, how frequently do those approaches agree in terms of finding particular variables to be associated with adverse events? (3) Do different approaches to handling missing data influence the outcomes and effect sizes of an analysis testing for an association with these variables with occurrence of adverse events? Patients who underwent spine surgery between 2005 and 2013 were identified from the ACS-NSQIP database. A total of 88,471 patients undergoing spine surgery were identified. The most common procedures were anterior cervical discectomy and fusion, lumbar decompression, and lumbar fusion. Demographics, comorbidities, and perioperative laboratory values were tabulated for each patient, and the percent of missing data was noted for each variable. These variables were tested for an association with "any adverse event" using three separate multivariate regressions that used the most common treatments for missing data. In the first regression, patients with any missing data were excluded. In the second regression, missing data were treated as a negative or "reference" value; for continuous variables, the mean of each variable's reference range was computed and imputed. In the third regression, any variables with > 10% rate of missing data were removed from the regression; among variables with ≤ 10% missing data, individual cases with missing values were excluded. The results of these regressions were compared to determine how the different treatments of missing data could affect the results of spine studies using the ACS-NSQIP database. Of the 88,471 patients, as many as 4441 (5%) had missing elements among demographic data, 69,184 (72%) among comorbidities, 70,892 (80%) among preoperative laboratory values, and 56,551 (64%) among operating room times. Considering the three different treatments of missing data, we found different risk factors for adverse events. Of 44 risk factors found to be associated with adverse events in any analysis, only 15 (34%) of these risk factors were common among the three regressions. The second treatment of missing data (assuming "normal" value) found the most risk factors (40) to be associated with any adverse event, whereas the first treatment (deleting patients with missing data) found the fewest associations at 20. Among the risk factors associated with any adverse event, the 10 with the greatest effect size (odds ratio) by each regression were ranked. Of the 15 variables in the top 10 for any regression, six of these were common among all three lists. Differing treatments of missing data can influence the results of spine studies using the ACS-NSQIP. The current study highlights the importance of considering how such missing data are handled. Until there are better guidelines on the best approaches to handle missing data, investigators should report how missing data were handled to increase the quality and transparency of orthopaedic database research. Readers of large database studies should note whether handling of missing data was addressed and consider potential bias with high rates or unspecified or weak methods for handling missing data.
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
ERIC Educational Resources Information Center
Gelman, Andrew; Imbens, Guido
2014-01-01
It is common in regression discontinuity analysis to control for high order (third, fourth, or higher) polynomials of the forcing variable. We argue that estimators for causal effects based on such methods can be misleading, and we recommend researchers do not use them, and instead use estimators based on local linear or quadratic polynomials or…
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA
Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui
2014-01-01
Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792
The Relationship among Leisure Interests, Personality Traits, Affect, and Mood
ERIC Educational Resources Information Center
Wilkinson, Todd J.; Hansen, Jo-Ida C.
2006-01-01
The present study examined relationships between leisure interests and the Big Five personality traits, positive and negative affect, and moods. Regression analysis identified particular personality but not mood or affect variables as significant predictors of leisure factor scores. Further exploration through factor analysis revealed factor…
Correlates of Successful Aging: Are They Universal?
ERIC Educational Resources Information Center
Litwin, Howard
2005-01-01
The analysis compared differing correlates of life satisfaction among three diverse population groups in Israel, examining background and health status variables, social environment factors, and activity indicators. Multiple regression analysis revealed that veteran Jewish-Israelis (n = 2,043) had the largest set of predictors, the strongest of…
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Peng, Yong; Peng, Shuangling; Wang, Xinghua; Tan, Shiyang
2018-06-01
This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha-Zhuzhou-Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle-fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle-fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle-fixed object accidents.
Söhn, Matthias; Alber, Markus; Yan, Di
2007-09-01
The variability of dose-volume histogram (DVH) shapes in a patient population can be quantified using principal component analysis (PCA). We applied this to rectal DVHs of prostate cancer patients and investigated the correlation of the PCA parameters with late bleeding. PCA was applied to the rectal wall DVHs of 262 patients, who had been treated with a four-field box, conformal adaptive radiotherapy technique. The correlated changes in the DVH pattern were revealed as "eigenmodes," which were ordered by their importance to represent data set variability. Each DVH is uniquely characterized by its principal components (PCs). The correlation of the first three PCs and chronic rectal bleeding of Grade 2 or greater was investigated with uni- and multivariate logistic regression analyses. Rectal wall DVHs in four-field conformal RT can primarily be represented by the first two or three PCs, which describe approximately 94% or 96% of the DVH shape variability, respectively. The first eigenmode models the total irradiated rectal volume; thus, PC1 correlates to the mean dose. Mode 2 describes the interpatient differences of the relative rectal volume in the two- or four-field overlap region. Mode 3 reveals correlations of volumes with intermediate doses ( approximately 40-45 Gy) and volumes with doses >70 Gy; thus, PC3 is associated with the maximal dose. According to univariate logistic regression analysis, only PC2 correlated significantly with toxicity. However, multivariate logistic regression analysis with the first two or three PCs revealed an increased probability of bleeding for DVHs with more than one large PC. PCA can reveal the correlation structure of DVHs for a patient population as imposed by the treatment technique and provide information about its relationship to toxicity. It proves useful for augmenting normal tissue complication probability modeling approaches.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
Borgquist, Ola; Wise, Matt P; Nielsen, Niklas; Al-Subaie, Nawaf; Cranshaw, Julius; Cronberg, Tobias; Glover, Guy; Hassager, Christian; Kjaergaard, Jesper; Kuiper, Michael; Smid, Ondrej; Walden, Andrew; Friberg, Hans
2017-08-01
Dysglycemia and glycemic variability are associated with poor outcomes in critically ill patients. Targeted temperature management alters blood glucose homeostasis. We investigated the association between blood glucose concentrations and glycemic variability and the neurologic outcomes of patients randomized to targeted temperature management at 33°C or 36°C after cardiac arrest. Post hoc analysis of the multicenter TTM-trial. Primary outcome of this analysis was neurologic outcome after 6 months, referred to as "Cerebral Performance Category." Thirty-six sites in Europe and Australia. All 939 patients with out-of-hospital cardiac arrest of presumed cardiac cause that had been included in the TTM-trial. Targeted temperature management at 33°C or 36°C. Nonparametric tests as well as multiple logistic regression and mixed effects logistic regression models were used. Median glucose concentrations on hospital admission differed significantly between Cerebral Performance Category outcomes (p < 0.0001). Hyper- and hypoglycemia were associated with poor neurologic outcome (p = 0.001 and p = 0.054). In the multiple logistic regression models, the median glycemic level was an independent predictor of poor Cerebral Performance Category (Cerebral Performance Category, 3-5) with an odds ratio (OR) of 1.13 in the adjusted model (p = 0.008; 95% CI, 1.03-1.24). It was also a predictor in the mixed model, which served as a sensitivity analysis to adjust for the multiple time points. The proportion of hyperglycemia was higher in the 33°C group compared with the 36°C group. Higher blood glucose levels at admission and during the first 36 hours, and higher glycemic variability, were associated with poor neurologic outcome and death. More patients in the 33°C treatment arm had hyperglycemia.
NASA Technical Reports Server (NTRS)
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
Using Data Mining for Wine Quality Assessment
NASA Astrophysics Data System (ADS)
Cortez, Paulo; Teixeira, Juliana; Cerdeira, António; Almeida, Fernando; Matos, Telmo; Reis, José
Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.
Parrett, Charles; Omang, R.J.; Hull, J.A.
1983-01-01
Equations for estimating mean annual runoff and peak discharge from measurements of channel geometry were developed for western and northeastern Montana. The study area was divided into two regions for the mean annual runoff analysis, and separate multiple-regression equations were developed for each region. The active-channel width was determined to be the most important independent variable in each region. The standard error of estimate for the estimating equation using active-channel width was 61 percent in the Northeast Region and 38 percent in the West region. The study area was divided into six regions for the peak discharge analysis, and multiple regression equations relating channel geometry and basin characteristics to peak discharges having recurrence intervals of 2, 5, 10, 25, 50 and 100 years were developed for each region. The standard errors of estimate for the regression equations using only channel width as an independent variable ranged from 35 to 105 percent. The standard errors improved in four regions as basin characteristics were added to the estimating equations. (USGS)
Regression equations for disinfection by-products for the Mississippi, Ohio and Missouri rivers
Rathbun, R.E.
1996-01-01
Trihalomethane and nonpurgeable total organic-halide formation potentials were determined for the chlorination of water samples from the Mississippi, Ohio and Missouri Rivers. Samples were collected during the summer and fall of 1991 and the spring of 1992 at twelve locations on the Mississippi from New Orleans to Minneapolis, and on the Ohio and Missouri 1.6 km upstream from their confluences with the Mississippi. Formation potentials were determined as a function of pH, initial free-chlorine concentration, and reaction time. Multiple linear regression analysis of the data indicated that pH, reaction time, and the dissolved organic carbon concentration and/or the ultraviolet absorbance of the water were the most significant variables. The initial free-chlorine concentration had less significance and bromide concentration had little or no significance. Analysis of combinations of the dissolved organic carbon concentration and the ultraviolet absorbance indicated that use of the ultraviolet absorbance alone provided the best prediction of the experimental data. Regression coefficients for the variables were generally comparable to coefficients previously presented in the literature for waters from other parts of the United States.
Multiple correlates of cigarette use among high school students.
McDermott, R J; Sarvela, P D; Hoalt, P N; Bajracharya, S M; Marty, P J; Emery, E M
1992-04-01
A cross-sectional survey research design measured factors related to cigarette use among 2,212 senior high school students. Results showed 14.3% of the sample smoked cigarettes at least occasionally, with 5.3% reporting they were daily smokers. About 12.8% indicated they were ex-smokers. Males and females smoked at almost equal rates, and the percentage of 10th grade student smokers was slightly higher (16.4%) than the percentage of juniors and seniors who smoked. Approximately 22% of Hispanic students, 15% of Caucasian students, and 4.5% of African-American students reported smoking cigarettes at least occasionally. An initial regression analysis used 21 variables to predict cigarette smoking. A more parsimonious regression model (R2 = .28), using variables from the initial regression analysis with significance levels of .01 or less, indicated the most important predictors of cigarette use were ethnic group, attitude toward females who smoke, close friends' use of cigarettes, personal use of marijuana, best friend's use of cigarettes, personal use of alcohol, and school self-esteem. Implications for school health programs are addressed.
Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K
2005-01-01
Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Survival analysis: Part I — analysis of time-to-event
2018-01-01
Length of time is a variable often encountered during data analysis. Survival analysis provides simple, intuitive results concerning time-to-event for events of interest, which are not confined to death. This review introduces methods of analyzing time-to-event. The Kaplan-Meier survival analysis, log-rank test, and Cox proportional hazards regression modeling method are described with examples of hypothetical data. PMID:29768911
Olea, Pedro P.; Mateo-Tomás, Patricia; de Frutos, Ángel
2010-01-01
Background Hierarchical partitioning (HP) is an analytical method of multiple regression that identifies the most likely causal factors while alleviating multicollinearity problems. Its use is increasing in ecology and conservation by its usefulness for complementing multiple regression analysis. A public-domain software “hier.part package” has been developed for running HP in R software. Its authors highlight a “minor rounding error” for hierarchies constructed from >9 variables, however potential bias by using this module has not yet been examined. Knowing this bias is pivotal because, for example, the ranking obtained in HP is being used as a criterion for establishing priorities of conservation. Methodology/Principal Findings Using numerical simulations and two real examples, we assessed the robustness of this HP module in relation to the order the variables have in the analysis. Results indicated a considerable effect of the variable order on the amount of independent variance explained by predictors for models with >9 explanatory variables. For these models the nominal ranking of importance of the predictors changed with variable order, i.e. predictors declared important by its contribution in explaining the response variable frequently changed to be either most or less important with other variable orders. The probability of changing position of a variable was best explained by the difference in independent explanatory power between that variable and the previous one in the nominal ranking of importance. The lesser is this difference, the more likely is the change of position. Conclusions/Significance HP should be applied with caution when more than 9 explanatory variables are used to know ranking of covariate importance. The explained variance is not a useful parameter to use in models with more than 9 independent variables. The inconsistency in the results obtained by HP should be considered in future studies as well as in those already published. Some recommendations to improve the analysis with this HP module are given. PMID:20657734
The Impact of Managerial Coaching on Learning Outcomes within the Team Context: An Analysis
ERIC Educational Resources Information Center
Hagen, Marcia; Aguilar, Mariya Gavrilova
2012-01-01
This study investigates the relationship between coaching expertise, project difficulty, and team empowerment on team learning outcomes within the context of a high-performance work team. Variables were tested using multiple regression analysis. The data were analyzed for two groups--team leaders and team members--using t-tests, factor analysis,…
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2 = 0.253 (0.260). The unit with the landslide susceptibility value > 0.5 (≦ 0.5) will be classified as a predicted landslide unit (not landslide unit). The AUC, i.e. the area under the relative operating characteristic curve, of or-LRLSM in the Chishan watershed is 0.72, while that of lr-LRLSM is 0.77. Furthermore, the average correct ratio of lr-LRLSM (73.3%) is better than that of or-LRLSM (68.3%). The research analyzed in detail the error sources from the two models. In continuous variables, using the landslide ratio-based classification in building the lr-LRLSM can let the distribution of weighted value more similar to distribution of landslide ratio in the range of continuous variable than that in building the or-LRLSM. In categorical variables, the meaning of using the landslide ratio-based classification in building the lr-LRLSM is to gather the parameters with approximate landslide ratio together. The mean correct ratio in continuous variables (categorical variables) by using the lr-LRLSM is better than that in or-LRLSM by 0.6 ~ 2.6% (1.7% ~ 6.0%). Building the landslide susceptibility model by using landslide ratio-based classification is practical and of better performance than that by using the original logistic regression.
A. C. Gellis; NO-VALUE
2013-01-01
The significant characteristics controlling the variability in storm-generated suspended-sediment loads and concentrations were analyzed for four basins of differing land use (forest, pasture, cropland, and urbanizing) in humid-tropical Puerto Rico. Statistical analysis involved stepwise regression on factor scores. The explanatory variables were attributes of flow,...
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
ERIC Educational Resources Information Center
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
Social determinants of childhood asthma symptoms: an ecological study in urban Latin America.
Fattore, Gisel L; Santos, Carlos A T; Barreto, Mauricio L
2014-04-01
Asthma is an important public health problem in urban Latin America. This study aimed to analyze the role of socioeconomic and environmental factors as potential determinants of asthma symptoms prevalence in children from Latin American (LA) urban centers. We selected 31 LA urban centers with complete data, and an ecological analysis was performed. According to our theoretical framework, the explanatory variables were classified in three levels: distal, intermediate, and proximate. The association between variables in the three levels and prevalence of asthma symptoms was examined by bivariate and multivariate linear regression analysis weighed by sample size. In a second stage, we fitted several linear regression models introducing sequentially the variables according to the predefined hierarchy. In the final hierarchical model Gini Index, crowding, sanitation, variation in infant mortality rates and homicide rates, explained great part of the variance in asthma prevalence between centers (R(2) = 75.0 %). We found a strong association between socioeconomic and environmental variables and prevalence of asthma symptoms in LA urban children, and according to our hierarchical framework and the results found we suggest that social inequalities (measured by the Gini Index) is a central determinant to explain high prevalence of asthma in LA.
A non-linear data mining parameter selection algorithm for continuous variables
Razavi, Marianne; Brady, Sean
2017-01-01
In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, J.; Moon, T.J.; Howell, J.R.
This paper presents an analysis of the heat transfer occurring during an in-situ curing process for which infrared energy is provided on the surface of polymer composite during winding. The material system is Hercules prepreg AS4/3501-6. Thermoset composites have an exothermic chemical reaction during the curing process. An Eulerian thermochemical model is developed for the heat transfer analysis of helical winding. The model incorporates heat generation due to the chemical reaction. Several assumptions are made leading to a two-dimensional, thermochemical model. For simplicity, 360{degree} heating around the mandrel is considered. In order to generate the appropriate process windows, the developedmore » heat transfer model is combined with a simple winding time model. The process windows allow for a proper selection of process variables such as infrared energy input and winding velocity to give a desired end-product state. Steady-state temperatures are found for each combination of the process variables. A regression analysis is carried out to relate the process variables to the resulting steady-state temperatures. Using regression equations, process windows for a wide range of cylinder diameters are found. A general procedure to find process windows for Hercules AS4/3501-6 prepreg tape is coded in a FORTRAN program.« less
Patterson, Megan S; Goodson, Patricia
2017-05-01
Compulsive exercise, a form of unhealthy exercise often associated with prioritizing exercise and feeling guilty when exercise is missed, is a common precursor to and symptom of eating disorders. College-aged women are at high risk of exercising compulsively compared with other groups. Social network analysis (SNA) is a theoretical perspective and methodology allowing researchers to observe the effects of relational dynamics on the behaviors of people. SNA was used to assess the relationship between compulsive exercise and body dissatisfaction, physical activity, and network variables. Descriptive statistics were conducted using SPSS, and quadratic assignment procedure (QAP) analyses were conducted using UCINET. QAP regression analysis revealed a statistically significant model (R 2 = .375, P < .0001) predicting compulsive exercise behavior. Physical activity, body dissatisfaction, and network variables were statistically significant predictor variables in the QAP regression model. In our sample, women who are connected to "important" or "powerful" people in their network are likely to have higher compulsive exercise scores. This result provides healthcare practitioners key target points for intervention within similar groups of women. For scholars researching eating disorders and associated behaviors, this study supports looking into group dynamics and network structure in conjunction with body dissatisfaction and exercise frequency.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Sudarno
2018-05-01
The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Harris, Katherine M.; Koenig, Harold G.; Han, Xiaotong; Sullivan, Greer; Mattox, Rhonda; Tang, Lingqi
2009-01-01
Objective The negative association between religiosity (religious beliefs and church attendance) and the likelihood of substance use disorders is well established, but the mechanism(s) remain poorly understood. We investigated whether this association was mediated by social support or mental health status. Method We utilized cross-sectional data from the 2002 National Survey on Drug Use and Health (n = 36,370). We first used logistic regression to regress any alcohol use in the past year on sociodemographic and religiosity variables. Then, among individuals who drank in the past year, we regressed past year alcohol abuse/dependence on sociodemographic and religiosity variables. To investigate whether social support mediated the association between religiosity and alcohol use and alcohol abuse/dependence we repeated the above models, adding the social support variables. To the extent that these added predictors modified the magnitude of the effect of the religiosity variables, we interpreted social support as a possible mediator. We also formally tested for mediation using path analysis. We investigated the possible mediating role of mental health status analogously. Parallel sets of analyses were conducted for any drug use, and drug abuse/dependence among those using any drugs as the dependent variables. Results The addition of social support and mental health status variables to logistic regression models had little effect on the magnitude of the religiosity coefficients in any of the models. While some of the tests of mediation were significant in the path analyses, the results were not always in the expected direction, and the magnitude of the effects was small. Conclusions The association between religiosity and decreased likelihood of a substance use disorder does not appear to be substantively mediated by either social support or mental health status. PMID:19714282
Gonçalves, Michele Martins; Leles, Cláudio Rodrigues; Freire, Maria do Carmo Matias
2013-01-01
The objective of this ecological study was to investigate the association between caries experience in 5- and 12-year-old Brazilian children in 2010 and household sugar procurement in 2003 and the effects of exposure to water fluoridation and socioeconomic indicators. Sample units were all 27 Brazilian capital cities. Data were obtained from the National Surveys of Oral Health; the National Household Food Budget Survey; and the United Nations Program for Development. Data analysis included correlation coefficients, exploratory factor analysis, and linear regression. There were significant negative associations between caries experience and procurement of confectionery, fluoridated water, HDI, and per capita income. Procurement of confectionery and soft drinks was positively associated with HDI and per capita income. Exploratory factor analysis grouped the independent variables by reducing highly correlated variables into two uncorrelated component variables that explained 86.1% of total variance. The first component included income, HDI, water fluoridation, and procurement of confectionery, while the second included free sugar and procurement of soft drinks. Multiple regression analysis showed that caries is associated with the first component. Caries experience was associated with better socioeconomic indicators of a city and exposure to fluoridated water, which may affect the impact of sugars on the disease. PMID:24307900
Youth tobacco sales in a metropolitan county: factors associated with compliance.
Pearson, Dave C; Song, Lin; Valdez, Roger B; Angulo, Antoinette S
2007-08-01
To describe and identify factors associated with tobacco sales in a metropolitan county. King County, Washington is the largest county in Washington State with an estimated population of 1.8 million or about 30% of the state's population. The data analysis is based on compliance checks in King County between January 2001 and March 2005. The 8879 checks were conducted by 91 youth operatives aged 14-17. Analysis of data was completed in 2006. The outcome variable for this analysis was whether "a sale was made" to a youth operative during a compliance check. Associations between independent variables and the outcome variable were examined using 2 x 2 tables, univariate (unadjusted) logistic regression, and multivariate (adjusted) logistic regression analysis. Overall tobacco sales during the 4-year and 3-month period was 7.7%. Convenience stores selling gas were significantly more likely to sell tobacco products to minors, whereas restaurants, bars, and tobacco discount stores were less likely to sell to minors. Other factors that were significantly associated with sales are described. In a county that has adopted many of the required youth access laws, opportunities still exist to reduce sales of tobacco products to minors. Asking for age and photo identification still appears to be an effective strategy in reducing sales of tobacco products to minors.
Factors Affecting the Clinical Success Rate of Miniscrew Implants for Orthodontic Treatment.
Jing, Zheng; Wu, Yeke; Jiang, Wenlu; Zhao, Lixing; Jing, Dian; Zhang, Nian; Cao, Xiaoqing; Xu, Zhenrui; Zhao, Zhihe
2016-01-01
The purpose of this study was to evaluate the various factors that influence the success rate of miniscrew implants used as orthodontic anchorage. Potential confounding variables examined were sex, age, vertical (FMA) and sagittal (ANB) skeletal facial pattern, site of placement (labial and buccal, palatal, and retromandibular triangle), arch of placement (maxilla and mandible), placement soft tissue type, oral hygiene, diameter and length of miniscrew implants, insertion method (predrilled or drill-free), angle of placement, onset and strength of force application, and clinical purpose. The correlations between success rate and overall variables were investigated by logistic regression analysis, and the effect of each variable on the success rate was utilized by variance analysis. One hundred fourteen patients were included with a total of 253 miniscrew implants. The overall success rate was 88.54% with an average loading period of 9.5 months in successful cases. Age, oral hygiene, vertical skeletal facial pattern (FMA), and general placement sites (maxillary and mandibular) presented significant differences in success rates both by logistic regression analysis and variance analysis (P < .05). To minimize the failure of miniscrew implants, proper oral hygiene instruction and effective supervision should be given for patients, especially young (< 12 years) high-angle patients with miniscrew implants placed in the mandible.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Ozone and sulfur dioxide effects on three tall fescue cultivars
DOE Office of Scientific and Technical Information (OSTI.GOV)
Flagler, R.B.; Youngner, V.B.
Although many reports have been published concerning differential susceptibility of various crops and/or cultivars to air pollutants, most have used foliar injury instead of the marketable yield as the factor that determined susceptibility for the crop. In an examination of screening in terms of marketable yield, three cultivars of tall fescue (Festuca arundinacea Schreb.), 'Alta,' 'Fawn,' and 'Kentucky 31,' were exposed to 0-0.40 ppm O/sub 3/ or 0-0.50 ppm SO/sub 2/ 6 h/d, once a week, for 7 and 9 weeks, respectively. Experimental design was a randomized complete block with three replications. Statistical analysis was by standard analysis of variancemore » and regression techniques. Three variables were analyzed: top dry weight (yield), tiller number, and weight per tiller. Ozone had a significant effect on all three variables. Significant linear decreases in yield and weight per tiller occurred with increasing O/sub 3/ concentrations. Linear regressions of these variables on O/sub 3/ concentration produced significantly different regression coefficients. The coefficient for Kentucky 31 was significantly greater than Alta or Fawn, which did not differ from each other. This indicated that Kentucky 31 was more susceptible to O/sub 3/ than either of the other cultivars. Percent reductions in dry weight for the three cultivars at highest O/sub 3/ level were 35, 44, and 53%, respectively, for Fawn, Alta, and Kentucky 31. For weight per tiller, Kentucky 31 had a higher percent reduction than the other cultivars (59 vs. 46 and 44%). Tiller number was generally increased by O/sub 3/, but this variable was not useful for determining differential susceptibility to the pollutant. Sulfur dioxide treatments produced no significant effects on any of the variables analyzed.« less
Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.
Caillet, Pascal; Klemm, Sarah; Ducher, Michel; Aussem, Alexandre; Schott, Anne-Marie
2015-01-01
Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach. EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences. Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density. Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.
Using regression analysis to predict emergency patient volume at the Indianapolis 500 mile race.
Bowdish, G E; Cordell, W H; Bock, H C; Vukov, L F
1992-10-01
Emergency physicians often plan and provide on-site medical care for mass gatherings. Most of the mass gathering literature is descriptive. Only a few studies have looked at factors such as crowd size, event characteristics, or weather in predicting numbers and types of patients at mass gatherings. We used regression analysis to relate patient volume on Race Day at the Indianapolis Motor Speedway to weather conditions and race characteristics. Race Day weather data for the years 1983 to 1989 were obtained from the National Oceanic and Atmospheric Administration. Data regarding patients treated on 1983 to 1989 Race Days were obtained from the facility hospital (Hannah Emergency Medical Center) data base. Regression analysis was performed using weather factors and race characteristics as independent variables and number of patients seen as the dependent variable. Data from 1990 were used to test the validity of the model. There was a significant relationship between dew point (which is calculated from temperature and humidity) and patient load (P less than .01). Dew point, however, failed to predict patient load during the 1990 race. No relationships could be established between humidity, sunshine, wind, or race characteristics and number of patients. Although higher dew point was associated with higher patient load during the 1983 to 1989 races, dew point was a poor predictor of patient load during the 1990 race. Regression analysis may be useful in identifying relationships between event characteristics and patient load but is probably inadequate to explain the complexities of crowd behavior and too simplified to use as a prediction tool.
"Mad or bad?": burden on caregivers of patients with personality disorders.
Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann
2012-12-01
The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
NASA Astrophysics Data System (ADS)
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
Sentiment analysis in twitter data using data analytic techniques for predictive modelling
NASA Astrophysics Data System (ADS)
Razia Sulthana, A.; Jaithunbi, A. K.; Sai Ramesh, L.
2018-04-01
Sentiment analysis refers to the task of natural language processing to determine whether a piece of text contains subjective information and the kind of subjective information it expresses. The subjective information represents the attitude behind the text: positive, negative or neutral. Understanding the opinions behind user-generated content automatically is of great concern. We have made data analysis with huge amount of tweets taken as big data and thereby classifying the polarity of words, sentences or entire documents. We use linear regression for modelling the relationship between a scalar dependent variable Y and one or more explanatory variables (or independent variables) denoted X. We conduct a series of experiments to test the performance of the system.
NASA Astrophysics Data System (ADS)
Jones, William I.
This study examined the understanding of nature of science among participants in their final year of a 4-year undergraduate teacher education program at a Midwest liberal arts university. The Logic Model Process was used as an integrative framework to focus the collection, organization, analysis, and interpretation of the data for the purpose of (1) describing participant understanding of NOS and (2) to identify participant characteristics and teacher education program features related to those understandings. The Views of Nature of Science Questionnaire form C (VNOS-C) was used to survey participant understanding of 7 target aspects of Nature of Science (NOS). A rubric was developed from a review of the literature to categorize and score participant understanding of the target aspects of NOS. Participants' high school and college transcripts, planning guides for their respective teacher education program majors, and science content and science teaching methods course syllabi were examined to identify and categorize participant characteristics and teacher education program features. The R software (R Project for Statistical Computing, 2010) was used to conduct an exploratory analysis to determine correlations of the antecedent and transaction predictor variables with participants' scores on the 7 target aspects of NOS. Fourteen participant characteristics and teacher education program features were moderately and significantly ( p < .01) correlated with participant scores on the target aspects of NOS. The 6 antecedent predictor variables were entered into multiple regression analyses to determine the best-fit model of antecedent predictor variables for each target NOS aspect. The transaction predictor variables were entered into separate multiple regression analyses to determine the best-fit model of transaction predictor variables for each target NOS aspect. Variables from the best-fit antecedent and best-fit transaction models for each target aspect of NOS were then combined. A regression analysis for each of the combined models was conducted to determine the relative effect of these variables on the target aspects of NOS. Findings from the multiple regression analyses revealed that each of the fourteen predictor variables was present in the best-fit model for at least 1 of the 7 target aspects of NOS. However, not all of the predictor variables were statistically significant (p < .007) in the models and their effect (beta) varied. Participants in the teacher education program who had higher ACT Math scores, completed more high school science credits, and were enrolled either in the Middle Childhood with a science concentration program major or in the Adolescent/Young Adult Science Education program major were more likely to have an informed understanding on each of the 7 target aspects of NOS. Analyses of the planning guides and the course syllabi in each teacher education program major revealed differences between the program majors that may account for the results.
Asano, Elio Fernando; Rasera, Irineu; Shiraga, Elisabete Cristina
2012-12-01
This is an exploratory analysis of potential variables associated with open Roux-en-Y gastric bypass (RYGB) surgery hospitalization resource use pattern. Cross-sectional study based on an administrative database (DATASUS) records. Inclusion criteria were adult patients undergoing RYGB between Jan/2008 and Jun/2011. Dependent variables were length of stay (LoS) and ICU need. Independent variables were: gender, age, region, hospital volume, surgery at certified center of excellence (CoE) by the Surgical Review Corporation (SRC), teaching hospital, and year of hospitalization. Univariate and multivariate analysis (logistic regression for ICU need and linear regression for length of stay) were performed. Data from 13,069 surgeries were analyzed. In crude analysis, hospital volume was the most impactful variable associated with log-transformed LoS (1.312 ± 0.302 high volume vs. 1.670 ± 0.581 low volume, p < 0.001), whereas for ICU need it was certified CoE (odds ratio (OR), 0.016; 95% confidence interval (CI), 0.010-0.026). After adjustment by logistic regression, certified CoE remained as the strongest predictor of ICU need (OR, 0.011; 95% CI, 0.007-0.018), followed by hospital volume (OR, 3.096; 95% CI, 2.861-3.350). Age group, male gender, and teaching hospital were also significantly associated (p < 0.001). For log-transformed LoS, final model includes hospital volume (coefficient, -0.223; 95% CI, -0.250 to -0.196) and teaching hospital (coefficient, 0.375; 95% CI, 0.351-0.398). Region of Brazil was not associated with any of the outcomes. High-volume hospital was the strongest predictor for shorter LoS, whereas SRC certification was the strongest predictor of lower ICU need. Public health policies targeting an increase of efficiency and patient access to the procedure should take into account these results.
Multivariate Regression Analysis of Winter Ozone Events in the Uinta Basin of Eastern Utah, USA
NASA Astrophysics Data System (ADS)
Mansfield, M. L.
2012-12-01
I report on a regression analysis of a number of variables that are involved in the formation of winter ozone in the Uinta Basin of Eastern Utah. One goal of the analysis is to develop a mathematical model capable of predicting the daily maximum ozone concentration from values of a number of independent variables. The dependent variable is the daily maximum ozone concentration at a particular site in the basin. Independent variables are (1) daily lapse rate, (2) daily "basin temperature" (defined below), (3) snow cover, (4) midday solar zenith angle, (5) monthly oil production, (6) monthly gas production, and (7) the number of days since the beginning of a multi-day inversion event. Daily maximum temperature and daily snow cover data are available at ten or fifteen different sites throughout the basin. The daily lapse rate is defined operationally as the slope of the linear least-squares fit to the temperature-altitude plot, and the "basin temperature" is defined as the value assumed by the same least-squares line at an altitude of 1400 m. A multi-day inversion event is defined as a set of consecutive days for which the lapse rate remains positive. The standard deviation in the accuracy of the model is about 10 ppb. The model has been combined with historical climate and oil & gas production data to estimate historical ozone levels.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
Abstract Objective: To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. Methods: This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95%, was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α <5%. Results: From 226 women included, 200 (88.5%) were 20-44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. Conclusions: This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. PMID:26100593
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2015-06-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11-year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (support vector regression, neural networks) besides the multiple linear regression approach. The analysis was applied to several current reanalysis data sets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how these types of data resolve especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the tropical stratosphere were found to be in qualitative agreement with previous attribution studies, although the agreement with observational results was incomplete, especially for JRA-55. The analysis also pointed to the solar signal in the ozone data sets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. The results obtained by linear regression were confirmed by the nonlinear approach through all data sets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. The seasonal evolution of the solar response was also discussed in terms of dynamical causalities in the winter hemispheres. The hypothetical mechanism of a weaker Brewer-Dobson circulation at solar maxima was reviewed together with a discussion of polar vortex behaviour.
NASA Astrophysics Data System (ADS)
Usman; Ersti Yulika Sari, T.; Syaifuddin; Audina
2017-01-01
The regression and correlation technic was uses to evaluated the contribution of chlorophyll-a concentration on variation of longline skipjack tuna production. An analysis was performed by placing Chlorophyll-a as predictor and Skipjack (Katsuwonus pelamis, Linnaeus 1758) production as dependent variable, using Chlorophyll-a derived from NPP VIIRS, and CPUE derived from longline fisherman log books for the year of 2013. Chlorophyll-a distribution which derived from NPP VIIRS between 0.13-0.26 mg/m3 whereas maximum CPUE as much as 0,1875 kg/trip in April. The regression equation obtained was CPUE = -1.12 + 11.5 Chl-a. Correlation between chlorophyll-a and CPUE have moderate relationship (r=0.51). From regression equation for those variables showed that the variation of chlorophyll-a had affected about 26% on variation of CPUE, only.
An improved partial least-squares regression method for Raman spectroscopy
NASA Astrophysics Data System (ADS)
Momenpour Tehran Monfared, Ali; Anis, Hanan
2017-10-01
It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
Partial Least Squares Regression Models for the Analysis of Kinase Signaling.
Bourgeois, Danielle L; Kreeger, Pamela K
2017-01-01
Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Effect of partition board color on mood and autonomic nervous function.
Sakuragi, Sokichi; Sugiyama, Yoshiki
2011-12-01
The purpose of this study was to evaluate the effects of the presence or absence (control) of a partition board and its color (red, yellow, blue) on subjective mood ratings and changes in autonomic nervous system indicators induced by a video game task. The increase in the mean Profile of Mood States (POMS) Fatigue score and mean Oppressive feeling rating after the task was lowest with the blue partition board. Multiple-regression analysis identified oppressive feeling and error scores on the second half of the task as statistically significant contributors to Fatigue. While explanatory variables were limited to the physiological indices, multiple-regression analysis identified a significant contribution of autonomic reactivity (assessed by heart rate variability) to Fatigue. These results suggest that a blue partition board would reduce task-induced subjective fatigue, in part by lowering the oppressive feeling of being enclosed during the task, possibly by increasing autonomic reactivity.
NASA Astrophysics Data System (ADS)
Zavala, Gabriel
This study aims to evaluate the relationship between oil income and the female labor force participation rate in California for the years of 1980, 1990, 2000 and 2010 using panel linear regression models. This study also aims to visualize the spatial patterns of both variables in California through Hot Spot analysis at the county level for the same years. The regression found no sign of a relationship between oil income and female labor force participation rate but did find evidence of a positive relationship between two income control variables and the female labor force participation rate. The hot spot analysis also found that female labor force participation cold spots are not spatially correlated with oil production hot spots. These findings contribute new methodologies at a finer scale to the very nuanced discussion of the resource curse in the United States.
High-level language ability in healthy individuals and its relationship with verbal working memory.
Antonsson, Malin; Longoni, Francesca; Einald, Christina; Hallberg, Lina; Kurt, Gabriella; Larsson, Kajsa; Nilsson, Tina; Hartelius, Lena
2016-01-01
The aims of the study were to investigate healthy subjects' performance on a clinical test of high-level language (HLL) and how it is related to demographic characteristics and verbal working memory (VWM). One hundred healthy subjects (20-79 years old) were assessed with the Swedish BeSS test (Laakso, Brunnegård, Hartelius, & Ahlsén, 2000) and two digit span tasks. Relationships between the demographic variables, VWM and BeSS were investigated both with bivariate correlations and multiple regression analysis. The results present the norms for BeSS. The correlations and multiple regression analysis show that demographic variables had limited influence on test performance. Measures of VWM were moderately related to total BeSS score and weakly to moderately correlated with five of the seven subtests. To conclude, education has an influence on the test as a whole but measures of VWM stood out as the most robust predictor of HLL.
NASA Astrophysics Data System (ADS)
Luna, Aderval S.; Gonzaga, Fabiano B.; da Rocha, Werickson F. C.; Lima, Igor C. A.
2018-01-01
Laser-induced breakdown spectroscopy (LIBS) analysis was carried out on eleven steel samples to quantify the concentrations of chromium, nickel, and manganese. LIBS spectral data were correlated to known concentrations of the samples using different strategies in partial least squares (PLS) regression models. For the PLS analysis, one predictive model was separately generated for each element, while different approaches were used for the selection of variables (VIP: variable importance in projection and iPLS: interval partial least squares) in the PLS model to quantify the contents of the elements. The comparison of the performance of the models showed that there was no significant statistical difference using the Wilcoxon signed rank test. The elliptical joint confidence region (EJCR) did not detect systematic errors in these proposed methodologies for each metal.
NASA Astrophysics Data System (ADS)
Szymanowski, Mariusz; Kryza, Maciej
2017-02-01
Our study examines the role of auxiliary variables in the process of spatial modelling and mapping of climatological elements, with air temperature in Poland used as an example. The multivariable algorithms are the most frequently applied for spatialization of air temperature, and their results in many studies are proved to be better in comparison to those obtained by various one-dimensional techniques. In most of the previous studies, two main strategies were used to perform multidimensional spatial interpolation of air temperature. First, it was accepted that all variables significantly correlated with air temperature should be incorporated into the model. Second, it was assumed that the more spatial variation of air temperature was deterministically explained, the better was the quality of spatial interpolation. The main goal of the paper was to examine both above-mentioned assumptions. The analysis was performed using data from 250 meteorological stations and for 69 air temperature cases aggregated on different levels: from daily means to 10-year annual mean. Two cases were considered for detailed analysis. The set of potential auxiliary variables covered 11 environmental predictors of air temperature. Another purpose of the study was to compare the results of interpolation given by various multivariable methods using the same set of explanatory variables. Two regression models: multiple linear (MLR) and geographically weighted (GWR) method, as well as their extensions to the regression-kriging form, MLRK and GWRK, respectively, were examined. Stepwise regression was used to select variables for the individual models and the cross-validation method was used to validate the results with a special attention paid to statistically significant improvement of the model using the mean absolute error (MAE) criterion. The main results of this study led to rejection of both assumptions considered. Usually, including more than two or three of the most significantly correlated auxiliary variables does not improve the quality of the spatial model. The effects of introduction of certain variables into the model were not climatologically justified and were seen on maps as unexpected and undesired artefacts. The results confirm, in accordance with previous studies, that in the case of air temperature distribution, the spatial process is non-stationary; thus, the local GWR model performs better than the global MLR if they are specified using the same set of auxiliary variables. If only GWR residuals are autocorrelated, the geographically weighted regression-kriging (GWRK) model seems to be optimal for air temperature spatial interpolation.
NASA Astrophysics Data System (ADS)
Pehnec, Gordana; Jakovljević, Ivana; Šišović, Anica; Bešlić, Ivan; Vađić, Vladimira
2016-04-01
Concentrations of ten polycyclic aromatic hydrocarbons (PAHs) in the PM10 particle fraction were measured together with ozone and meteorological parameters at an urban site (Zagreb, Croatia) over a one-year period. Data were subjected to regression analysis in order to determine the relationship between the measured pollutants and selected meteorological variables. All of the PAHs showed seasonal variations with high concentrations in winter and autumn and very low concentrations during summer and spring. All of the ten PAHs concentrations also correlated well with each other. A statistically significant negative correlation was found between the concentrations of PAHs and ozone concentrations and concentrations of PAHs and temperature, as well as a positive correlation between concentrations of PAHs and PM10 mass concentration and relative humidity. Multiple regression analysis showed that concentrations of PM10 and ozone, temperature, relative humidity and pressure accounted for 43-70% of PAHs variability. Concentrations of PM10 and temperature were significant variables for all of the measured PAH's concentrations in all seasons. Ozone concentrations were significant for only some of the PAHs, particularly 6-ring PAHs.
Yokoi, Masayuki; Tashiro, Takao
2014-01-01
We studied how the separation of dispensing and prescribing of medicines between pharmacies and clinics (the “separation system”) can reduce internal medicine costs. To do so, we obtained publicly available data by searching electronic databases and official web pages of the Japanese government and non-profit public service corporations on the Internet. For Japanese medical institutions, participation in the separation system is optional. Consequently, the expansion rate of the separation system for each of the administrative districts is highly variable. The data were subjected to multiple regression analysis; daily internal medicines were the objective variable and expansion rate of the separation system was the explanatory variable. A multiple regression analysis revealed that the expansion rate of the separation system and the rate of replacing brand name medicine with generic medicine showed a significant negative partial correlation with daily internal medicine costs. Thus, the separation system was as effective in reducing medicine costs as the use of generic medicines. Because of its medical economic efficiency, the separation system should be expanded, especially in Asian countries in which the system is underdeveloped. PMID:24999122
Can biomechanical variables predict improvement in crouch gait?
Hicks, Jennifer L.; Delp, Scott L.; Schwartz, Michael H.
2011-01-01
Many patients respond positively to treatments for crouch gait, yet surgical outcomes are inconsistent and unpredictable. In this study, we developed a multivariable regression model to determine if biomechanical variables and other subject characteristics measured during a physical exam and gait analysis can predict which subjects with crouch gait will demonstrate improved knee kinematics on a follow-up gait analysis. We formulated the model and tested its performance by retrospectively analyzing 353 limbs of subjects who walked with crouch gait. The regression model was able to predict which subjects would demonstrate ‘improved’ and ‘unimproved’ knee kinematics with over 70% accuracy, and was able to explain approximately 49% of the variance in subjects’ change in knee flexion between gait analyses. We found that improvement in stance phase knee flexion was positively associated with three variables that were drawn from knowledge about the biomechanical contributors to crouch gait: i) adequate hamstrings lengths and velocities, possibly achieved via hamstrings lengthening surgery, ii) normal tibial torsion, possibly achieved via tibial derotation osteotomy, and iii) sufficient muscle strength. PMID:21616666
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.
Gong, Xiajing; Hu, Meng; Zhao, Liang
2018-05-01
Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Myxomatosis in wild rabbit: design of control programs in Mediterranean ecosystems.
García-Bocanegra, Ignacio; Astorga, Rafael Jesús; Napp, Sebastián; Casal, Jordi; Huerta, Belén; Borge, Carmen; Arenas, Antonio
2010-01-01
A cross-sectional study was carried out in natural wild rabbit (Oryctolagus cuniculus) populations from southern Spain to identify risk factors associated to myxoma virus infection. Blood samples from 619 wild rabbits were collected, and questionnaires which included variables related to host, disease, game management and environment were completed. A logistic regression analysis was conducted to investigate the associations between myxomatosis seropositivity (dependent variable) across 7 hunting estates and an extensive set of explanatory variables obtained from the questionnaires. The prevalence of antibodies against myxomatosis virus was 56.4% (95% CI: 52.5-60.3) and ranged between 21.4% (95% CI: 9.0-33.8) and 70.2% (95% CI: 58.3-82.1) among the different sampling areas. The logistic regression analysis showed that autumn (OR 9.0), high abundance of mosquitoes (OR 8.2), reproductive activity (OR 4.1), warren's insecticide treatment (OR 3.7), rabbit haemorrhagic disease (RHD) seropositivity (OR 2.6), high hunting pressure (OR 6.3) and sheep presence (OR 6.4) were associated with seropositivity to myxomatosis. Based on the results, diverse management measures for myxomatosis control are proposed.
Yokoi, Masayuki; Tashiro, Takao
2014-04-07
We studied how the separation of dispensing and prescribing of medicines between pharmacies and clinics (the "separation system") can reduce internal medicine costs. To do so, we obtained publicly available data by searching electronic databases and official web pages of the Japanese government and non-profit public service corporations on the Internet. For Japanese medical institutions, participation in the separation system is optional. Consequently, the expansion rate of the separation system for each of the administrative districts is highly variable. The data were subjected to multiple regression analysis; daily internal medicines were the objective variable and expansion rate of the separation system was the explanatory variable. A multiple regression analysis revealed that the expansion rate of the separation system and the rate of replacing brand name medicine with generic medicine showed a significant negative partial correlation with daily internal medicine costs. Thus, the separation system was as effective in reducing medicine costs as the use of generic medicines. Because of its medical economic efficiency, the separation system should be expanded, especially in Asian countries in which the system is underdeveloped.
Nelson, Jon P
2014-01-01
Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.
Curran, Christopher A.; Eng, Ken; Konrad, Christopher P.
2012-01-01
Regional low-flow regression models for estimating Q7,10 at ungaged stream sites are developed from the records of daily discharge at 65 continuous gaging stations (including 22 discontinued gaging stations) for the purpose of evaluating explanatory variables. By incorporating the base-flow recession time constant τ as an explanatory variable in the regression model, the root-mean square error for estimating Q7,10 at ungaged sites can be lowered to 72 percent (for known values of τ), which is 42 percent less than if only basin area and mean annual precipitation are used as explanatory variables. If partial-record sites are included in the regression data set, τ must be estimated from pairs of discharge measurements made during continuous periods of declining low flows. Eight measurement pairs are optimal for estimating τ at partial-record sites, and result in a lowering of the root-mean square error by 25 percent. A low-flow survey strategy that includes paired measurements at partial-record sites requires additional effort and planning beyond a standard strategy, but could be used to enhance regional estimates of τ and potentially reduce the error of regional regression models for estimating low-flow characteristics at ungaged sites.
NASA Astrophysics Data System (ADS)
Zhai, Liang; Li, Shuang; Zou, Bin; Sang, Huiyong; Fang, Xin; Xu, Shan
2018-05-01
Considering the spatial non-stationary contributions of environment variables to PM2.5 variations, the geographically weighted regression (GWR) modeling method has been using to estimate PM2.5 concentrations widely. However, most of the GWR models in reported studies so far were established based on the screened predictors through pretreatment correlation analysis, and this process might cause the omissions of factors really driving PM2.5 variations. This study therefore developed a best subsets regression (BSR) enhanced principal component analysis-GWR (PCA-GWR) modeling approach to estimate PM2.5 concentration by fully considering all the potential variables' contributions simultaneously. The performance comparison experiment between PCA-GWR and regular GWR was conducted in the Beijing-Tianjin-Hebei (BTH) region over a one-year-period. Results indicated that the PCA-GWR modeling outperforms the regular GWR modeling with obvious higher model fitting- and cross-validation based adjusted R2 and lower RMSE. Meanwhile, the distribution map of PM2.5 concentration from PCA-GWR modeling also clearly depicts more spatial variation details in contrast to the one from regular GWR modeling. It can be concluded that the BSR enhanced PCA-GWR modeling could be a reliable way for effective air pollution concentration estimation in the coming future by involving all the potential predictor variables' contributions to PM2.5 variations.
Kandasamy, Jegen; Roane, Claire; Szalai, Alexander; Ambalavanan, Namasivayam
2015-11-01
Early systemic inflammation in extremely-low-birth-weight (ELBW) infants is associated with an increased risk of bronchopulmonary dysplasia (BPD). Our objective was to identify circulating biomarkers and develop prediction models for BPD/death soon after birth. Blood samples from postnatal day 1 were analyzed for C-reactive protein (CRP) by enzyme-linked immunosorbent assay and for 39 cytokines/chemokines by a multiplex assay in 152 ELBW infants. The primary outcome was physiologic BPD or death by 36 wk. CRP, cytokines, and clinical variables available at ≤24 h were used for forward stepwise regression and Classification and Regression Tree (CART) analysis to identify predictors of BPD/death. Overall, 24% developed BPD and 35% died or developed BPD. Regression analysis identified birth weight and eotaxin (CCL11) as the two most significant variables. CART identified FiO2 at 24 h (11% BPD/death if FiO2 ≤28%, 49% if >28%) and eotaxin in infants with FiO2 > 28% (29% BPD/death if eotaxin was ≤84 pg/ml; 65% if >84) as variables most associated with outcome. Eotaxin measured on the day of birth is useful for identifying ELBW infants at risk of BPD/death. Further investigation is required to determine if eotaxin is involved in lung injury and pathogenesis of BPD.
Psychosocial factors and financial literacy.
Murphy, John L
2013-01-01
This study uses data from the Health and Retirement Study (HRS) to analyze the psychological and social variables associated with financial literacy. The HRS is a nationally representative longitudinal survey of individuals older than age 50 and their spouses. An ordinary least squares linear regression analysis explores the relationship between financial literacy and several economic and psychosocial variables. After controlling for earnings, level of education, and other socioeconomic variables in this exploratory study, I find that financial satisfaction and religiosity are correlated with financial literacy.
Christoper J. Schmitt; A. Dennis Lemly; Parley V. Winger
1993-01-01
Data from several sources were collated and analyzed by correlation, regression, and principal components analysis to define surrrogate variables for use in the brook trout (Salvelinus fontinalis) habitat suitability index (HSI) model, and to evaluate the applicability of the model for assessing habitat in high elevation streams of the southern Blue Ridge Province (...
The Influence of the Student Mobility Rate on the Graduation Rate in the State of New Jersey
ERIC Educational Resources Information Center
Ross, Lavetta S.
2016-01-01
This study examined the influence of the student mobility rate on the high school graduation rate of schools in the state of New Jersey. Variables found to have an influence on the graduation rate in the extant literature were evaluated and reported. The analysis included multiple and hierarchical regression models for school variables (i.e.,…
Causal diagrams and multivariate analysis II: precision work.
Jupiter, Daniel C
2014-01-01
In this Investigators' Corner, I continue my discussion of when and why we researchers should include variables in multivariate regression. My examination focuses on studies comparing treatment groups and situations for which we can either exclude variables from multivariate analyses or include them for reasons of precision. Copyright © 2014 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Pressures, Stresses, Anxieties, and On-Job Safety of the School Superintendent.
ERIC Educational Resources Information Center
Chand, Krishan
Identification of the causes of job stress for public school superintendents, with a focus on personal-experiential and task variables, is the purpose of this study. Methodology involved a mail survey of 1,531 randomly selected superintendents. Canonical correlation analysis (CCA) and multiple regression correlation (MCR) analysis were used to…
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple…
Innovating patient care delivery: DSRIP's interrupted time series analysis paradigm.
Shenoy, Amrita G; Begley, Charles E; Revere, Lee; Linder, Stephen H; Daiger, Stephen P
2017-12-08
Adoption of Medicaid Section 1115 waiver is one of the many ways of innovating healthcare delivery system. The Delivery System Reform Incentive Payment (DSRIP) pool, one of the two funding pools of the waiver has four categories viz. infrastructure development, program innovation and redesign, quality improvement reporting and lastly, bringing about population health improvement. A metric of the fourth category, preventable hospitalization (PH) rate was analyzed in the context of eight conditions for two time periods, pre-reporting years (2010-2012) and post-reporting years (2013-2015) for two hospital cohorts, DSRIP participating and non-participating hospitals. The study explains how DSRIP impacted Preventable Hospitalization (PH) rates of eight conditions for both hospital cohorts within two time periods. Eight PH rates were regressed as the dependent variable with time, intervention and post-DSRIP Intervention as independent variables. PH rates of eight conditions were then consolidated into one rate for regressing with the above independent variables to evaluate overall impact of DSRIP. An interrupted time series regression was performed after accounting for auto-correlation, stationarity and seasonality in the dataset. In the individual regression model, PH rates showed statistically significant coefficients for seven out of eight conditions in DSRIP participating hospitals. In the combined regression model, the coefficient of the PH rate showed a statistically significant decrease with negative p-values for regression coefficients in DSRIP participating hospitals compared to positive/increased p-values for regression coefficients in DSRIP non-participating hospitals. Several macro- and micro-level factors may have likely contributed DSRIP hospitals outperforming DSRIP non-participating hospitals. Healthcare organization/provider collaboration, support from healthcare professionals, DSRIP's design, state reimbursement and coordination in care delivery methods may have led to likely success of DSRIP. IV, a retrospective cohort study based on longitudinal data. Copyright © 2017 Elsevier Inc. All rights reserved.
Galloway, Joel M.
2014-01-01
The Red River of the North (hereafter referred to as “Red River”) Basin is an important hydrologic region where water is a valuable resource for the region’s economy. Continuous water-quality monitors have been operated by the U.S. Geological Survey, in cooperation with the North Dakota Department of Health, Minnesota Pollution Control Agency, City of Fargo, City of Moorhead, City of Grand Forks, and City of East Grand Forks at the Red River at Fargo, North Dakota, from 2003 through 2012 and at Grand Forks, N.Dak., from 2007 through 2012. The purpose of the monitoring was to provide a better understanding of the water-quality dynamics of the Red River and provide a way to track changes in water quality. Regression equations were developed that can be used to estimate concentrations and loads for dissolved solids, sulfate, chloride, nitrate plus nitrite, total phosphorus, and suspended sediment using explanatory variables such as streamflow, specific conductance, and turbidity. Specific conductance was determined to be a significant explanatory variable for estimating dissolved solids concentrations at the Red River at Fargo and Grand Forks. The regression equations provided good relations between dissolved solid concentrations and specific conductance for the Red River at Fargo and at Grand Forks, with adjusted coefficients of determination of 0.99 and 0.98, respectively. Specific conductance, log-transformed streamflow, and a seasonal component were statistically significant explanatory variables for estimating sulfate in the Red River at Fargo and Grand Forks. Regression equations provided good relations between sulfate concentrations and the explanatory variables, with adjusted coefficients of determination of 0.94 and 0.89, respectively. For the Red River at Fargo and Grand Forks, specific conductance, streamflow, and a seasonal component were statistically significant explanatory variables for estimating chloride. For the Red River at Grand Forks, a time component also was a statistically significant explanatory variable for estimating chloride. The regression equations for chloride at the Red River at Fargo provided a fair relation between chloride concentrations and the explanatory variables, with an adjusted coefficient of determination of 0.66 and the equation for the Red River at Grand Forks provided a relatively good relation between chloride concentrations and the explanatory variables, with an adjusted coefficient of determination of 0.77. Turbidity and streamflow were statistically significant explanatory variables for estimating nitrate plus nitrite concentrations at the Red River at Fargo and turbidity was the only statistically significant explanatory variable for estimating nitrate plus nitrite concentrations at Grand Forks. The regression equation for the Red River at Fargo provided a relatively poor relation between nitrate plus nitrite concentrations, turbidity, and streamflow, with an adjusted coefficient of determination of 0.46. The regression equation for the Red River at Grand Forks provided a fair relation between nitrate plus nitrite concentrations and turbidity, with an adjusted coefficient of determination of 0.73. Some of the variability that was not explained by the equations might be attributed to different sources contributing nitrates to the stream at different times. Turbidity, streamflow, and a seasonal component were statistically significant explanatory variables for estimating total phosphorus at the Red River at Fargo and Grand Forks. The regression equation for the Red River at Fargo provided a relatively fair relation between total phosphorus concentrations, turbidity, streamflow, and season, with an adjusted coefficient of determination of 0.74. The regression equation for the Red River at Grand Forks provided a good relation between total phosphorus concentrations, turbidity, streamflow, and season, with an adjusted coefficient of determination of 0.87. For the Red River at Fargo, turbidity and streamflow were statistically significant explanatory variables for estimating suspended-sediment concentrations. For the Red River at Grand Forks, turbidity was the only statistically significant explanatory variable for estimating suspended-sediment concentration. The regression equation at the Red River at Fargo provided a good relation between suspended-sediment concentration, turbidity, and streamflow, with an adjusted coefficient of determination of 0.95. The regression equation for the Red River at Grand Forks provided a good relation between suspended-sediment concentration and turbidity, with an adjusted coefficient of determination of 0.96.
Predictors of condom use and refusal among the population of Free State province in South Africa
2012-01-01
Background This study investigated the extent and predictors of condom use and condom refusal in the Free State province in South Africa. Methods Through a household survey conducted in the Free Sate province of South Africa, 5,837 adults were interviewed. Univariate and multivariate survey logistic regressions and classification trees (CT) were used for analysing two response variables ‘ever used condom’ and ‘ever refused condom’. Results Eighty-three per cent of the respondents had ever used condoms, of which 38% always used them; 61% used them during the last sexual intercourse and 9% had ever refused to use them. The univariate logistic regression models and CT analysis indicated that a strong predictor of condom use was its perceived need. In the CT analysis, this variable was followed in importance by ‘knowledge of correct use of condom’, condom availability, young age, being single and higher education. ‘Perceived need’ for condoms did not remain significant in the multivariate analysis after controlling for other variables. The strongest predictor of condom refusal, as shown by the CT, was shame associated with condoms followed by the presence of sexual risk behaviour, knowing one’s HIV status, older age and lacking knowledge of condoms (i.e., ability to prevent sexually transmitted diseases and pregnancy, availability, correct and consistent use and existence of female condoms). In the multivariate logistic regression, age was not significant for condom refusal while affordability and perceived need were additional significant variables. Conclusions The use of complementary modelling techniques such as CT in addition to logistic regressions adds to a better understanding of condom use and refusal. Further improvement in correct and consistent use of condoms will require targeted interventions. In addition to existing social marketing campaigns, tailored approaches should focus on establishing the perceived need for condom-use and improving skills for correct use. They should also incorporate interventions to reduce the shame associated with condoms and individual counselling of those likely to refuse condoms. PMID:22639964
Wenzel, Tom
2013-10-01
The National Highway Traffic Safety Administration (NHTSA) recently updated its 2003 and 2010 logistic regression analyses of the effect of a reduction in light-duty vehicle mass on US societal fatality risk per vehicle mile traveled (VMT; Kahane, 2012). Societal fatality risk includes the risk to both the occupants of the case vehicle as well as any crash partner or pedestrians. The current analysis is the most thorough investigation of this issue to date. This paper replicates the Kahane analysis and extends it by testing the sensitivity of his results to changes in the definition of risk, and the data and control variables used in the regression models. An assessment by Lawrence Berkeley National Laboratory (LBNL) indicates that the estimated effect of mass reduction on risk is smaller than in Kahane's previous studies, and is statistically non-significant for all but the lightest cars (Wenzel, 2012a). The estimated effects of a reduction in mass or footprint (i.e. wheelbase times track width) are small relative to other vehicle, driver, and crash variables used in the regression models. The recent historical correlation between mass and footprint is not so large to prohibit including both variables in the same regression model; excluding footprint from the model, i.e. allowing footprint to decrease with mass, increases the estimated detrimental effect of mass reduction on risk in cars and crossover utility vehicles (CUVs)/minivans, but has virtually no effect on light trucks. Analysis by footprint deciles indicates that risk does not consistently increase with reduced mass for vehicles of similar footprint. Finally, the estimated effects of mass and footprint reduction are sensitive to the measure of exposure used (fatalities per induced exposure crash, rather than per VMT), as well as other changes in the data or control variables used. It appears that the safety penalty from lower mass can be mitigated with careful vehicle design, and that manufacturers can reduce mass as a strategy to increase their vehicles' fuel economy and reduce greenhouse gas emissions without necessarily compromising societal safety. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; ...
2016-04-07
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less
Assessing LULC changes over Chilika Lake watershed in Eastern India using Driving Force Analysis
NASA Astrophysics Data System (ADS)
Jadav, S.; Syed, T. H.
2017-12-01
Rapid population growth and industrial development has brought about significant changes in Land Use Land Cover (LULC) of many developing countries in the world. This study investigates LULC changes in the Chilika Lake watershed of Eastern India for the period of 1988 to 2016. The methodology involves pre-processing and classification of Landsat satellite images using support vector machine (SVM) supervised classification algorithm. Results reveal that `Cropland', `Emergent Vegetation' and `Settlement' has expanded over the study period by 284.61 km², 106.83 km² and 98.83 km² respectively. Contemporaneously, `Lake Area', `Vegetation' and `Scrub Land' have decreased by 121.62 km², 96.05 km² and 80.29 km² respectively. This study also analyzes five major driving force variables of socio-economic and climatological factors triggering LULC changes through a bivariate logistic regression model. The outcome gives credible relative operating characteristics (ROC) value of 0.76 that indicate goodness fit of logistic regression model. In addition, independent variables like distance to drainage network and average annual rainfall have negative regression coefficient values that represent decreased rate of dependent variable (changed LULC) whereas independent variables (population density, distance to road and distance to railway) have positive regression coefficient indicates increased rate of changed LULC . Results from this study will be crucial for planning and restoration of this vital lake water body that has major implications over the society and environment at large.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Yang, Shun-hua; Zhang, Hai-tao; Guo, Long; Ren, Yan
2015-06-01
Relative elevation and stream power index were selected as auxiliary variables based on correlation analysis for mapping soil organic matter. Geographically weighted regression Kriging (GWRK) and regression Kriging (RK) were used for spatial interpolation of soil organic matter and compared with ordinary Kriging (OK), which acts as a control. The results indicated that soil or- ganic matter was significantly positively correlated with relative elevation whilst it had a significantly negative correlation with stream power index. Semivariance analysis showed that both soil organic matter content and its residuals (including ordinary least square regression residual and GWR resi- dual) had strong spatial autocorrelation. Interpolation accuracies by different methods were esti- mated based on a data set of 98 validation samples. Results showed that the mean error (ME), mean absolute error (MAE) and root mean square error (RMSE) of RK were respectively 39.2%, 17.7% and 20.6% lower than the corresponding values of OK, with a relative-improvement (RI) of 20.63. GWRK showed a similar tendency, having its ME, MAE and RMSE to be respectively 60.6%, 23.7% and 27.6% lower than those of OK, with a RI of 59.79. Therefore, both RK and GWRK significantly improved the accuracy of OK interpolation of soil organic matter due to their in- corporation of auxiliary variables. In addition, GWRK performed obviously better than RK did in this study, and its improved performance should be attributed to the consideration of sample spatial locations.
NASA Astrophysics Data System (ADS)
Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.
2017-12-01
Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
NASA Astrophysics Data System (ADS)
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index for diverse field conditions. Sci. Total Env. 541:424-434. Pulido Moncada et al., 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma, 235:271-278. Stellacci et al., 2016. Comparison of different multivariate methods to select key soil variables for soil quality indices computation. XLV Congress of the Italian Society of Agronomy (SIA), Sassari, 20-22 September 2016.
Ghasemi, Jahan B; Safavi-Sohi, Reihaneh; Barbosa, Euzébio G
2012-02-01
A quasi 4D-QSAR has been carried out on a series of potent Gram-negative LpxC inhibitors. This approach makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. This new methodology is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are independent variables employed in a QSAR analysis. The comparison of the proposed methodology to comparative molecular field analysis (CoMFA) formalism was performed. This methodology explores jointly the main features of CoMFA and 4D-QSAR models. Step-wise multiple linear regression was used for the selection of the most informative variables. After variable selection, multiple linear regression (MLR) and partial least squares (PLS) methods used for building the regression models. Leave-N-out cross-validation (LNO), and Y-randomization were performed in order to confirm the robustness of the model in addition to analysis of the independent test set. Best models provided the following statistics: [Formula in text] (PLS) and [Formula in text] (MLR). Docking study was applied to investigate the major interactions in protein-ligand complex with CDOCKER algorithm. Visualization of the descriptors of the best model helps us to interpret the model from the chemical point of view, supporting the applicability of this new approach in rational drug design.
Rahman, Abdul; Perri, Andrea; Deegan, Avril; Kuntz, Jennifer; Cawthorpe, David
2018-01-01
There is a movement toward trauma-informed, trauma-focused psychiatric treatment. To examine Adverse Childhood Experiences (ACE) survey items by sex and by total scores by sex vs clinical measures of impairment to examine the clinical utility of the ACE survey as an index of trauma in a child and adolescent mental health care setting. Descriptive, polychoric factor analysis and regression analyses were employed to analyze cross-sectional ACE surveys (N = 2833) and registration-linked data using past admissions (N = 10,400) collected from November 2016 to March 2017 related to clinical data (28 independent variables), taking into account multicollinearity. Distinct ACE items emerged for males, females, and those with self-identified sex and for ACE total scores in regression analysis. In hierarchical regression analysis, the final models consisting of standard clinical measures and demographic and system variables (eg, repeated admissions) were associated with substantial ACE total score variance for females (44%) and males (38%). Inadequate sample size foreclosed on developing a reduced multivariable model for the self-identified sex group. The ACE scores relate to independent clinical measures and system and demographic variables. There are implications for clinical practice. For example, a child presenting with anxiety and a high ACE score likely requires treatment that is different from a child presenting with anxiety and an ACE score of zero. The ACE survey score is an important index of presenting clinical status that guides patient care planning and intervention in the progress toward a trauma-focused system of care.
NASA Astrophysics Data System (ADS)
Whitehead, James Joshua
The analysis documented herein provides an integrated approach for the conduct of optimization under uncertainty (OUU) using Monte Carlo Simulation (MCS) techniques coupled with response surface-based methods for characterization of mixture-dependent variables. This novel methodology provides an innovative means of conducting optimization studies under uncertainty in propulsion system design. Analytic inputs are based upon empirical regression rate information obtained from design of experiments (DOE) mixture studies utilizing a mixed oxidizer hybrid rocket concept. Hybrid fuel regression rate was selected as the target response variable for optimization under uncertainty, with maximization of regression rate chosen as the driving objective. Characteristic operational conditions and propellant mixture compositions from experimental efforts conducted during previous foundational work were combined with elemental uncertainty estimates as input variables. Response surfaces for mixture-dependent variables and their associated uncertainty levels were developed using quadratic response equations incorporating single and two-factor interactions. These analysis inputs, response surface equations and associated uncertainty contributions were applied to a probabilistic MCS to develop dispersed regression rates as a function of operational and mixture input conditions within design space. Illustrative case scenarios were developed and assessed using this analytic approach including fully and partially constrained operational condition sets over all of design mixture space. In addition, optimization sets were performed across an operationally representative region in operational space and across all investigated mixture combinations. These scenarios were selected as representative examples relevant to propulsion system optimization, particularly for hybrid and solid rocket platforms. Ternary diagrams, including contour and surface plots, were developed and utilized to aid in visualization. The concept of Expanded-Durov diagrams was also adopted and adapted to this study to aid in visualization of uncertainty bounds. Regions of maximum regression rate and associated uncertainties were determined for each set of case scenarios. Application of response surface methodology coupled with probabilistic-based MCS allowed for flexible and comprehensive interrogation of mixture and operating design space during optimization cases. Analyses were also conducted to assess sensitivity of uncertainty to variations in key elemental uncertainty estimates. The methodology developed during this research provides an innovative optimization tool for future propulsion design efforts.
Martial arts as a mental health intervention for children? Evidence from the ECLS-K
Strayhorn, Joseph M; Strayhorn, Jillian C
2009-01-01
Background Martial arts studios for children market their services as providing mental health outcomes such as self-esteem, self-confidence, concentration, and self-discipline. It appears that many parents enroll their children in martial arts in hopes of obtaining such outcomes. The current study used the data from the Early Childhood Longitudinal Study, Kindergarten class of 1998-1999, to assess the effects of martial arts upon such outcomes as rated by classroom teachers. Methods The Early Childhood Longitudinal Study used a multistage probability sampling design to gather a sample representative of U.S. children attending kindergarten beginning 1998. We made use of data collected in the kindergarten, 3rd grade, and 5th grade years. Classroom behavior was measured by a rating scale completed by teachers; participation in martial arts was assessed as part of a parent interview. The four possible combinations of participation and nonparticipation in martial arts at time 1 and time 2 for each analysis were coded into three dichotomous variables; the set of three variables constituted the measure of participation studied through regression. Multiple regression was used to estimate the association between martial arts participation and change in classroom behavior from one measurement occasion to the next. The change from kindergarten to third grade was studied as a function of martial arts participation, and the analysis was replicated studying behavior change from third grade to fifth grade. Cohen's f2 effect sizes were derived from these regressions. Results The martial arts variable failed to show a statistically significant effect on behavior, in either of the regression analyses; in fact, the f2 effect size for martial arts was 0.000 for both analyses. The 95% confidence intervals for regression coefficients for martial arts variables have upper and lower bounds that are all close to zero. The analyses not only fail to reject the null hypothesis, but also render unlikely a population effect size that differs greatly from zero. Conclusion The data from the ECLS-K fail to support enrolling children in martial arts to improve mental health outcomes as measured by classroom teachers. PMID:19828027
Martial arts as a mental health intervention for children? Evidence from the ECLS-K.
Strayhorn, Joseph M; Strayhorn, Jillian C
2009-10-14
Martial arts studios for children market their services as providing mental health outcomes such as self-esteem, self-confidence, concentration, and self-discipline. It appears that many parents enroll their children in martial arts in hopes of obtaining such outcomes. The current study used the data from the Early Childhood Longitudinal Study, Kindergarten class of 1998-1999, to assess the effects of martial arts upon such outcomes as rated by classroom teachers. The Early Childhood Longitudinal Study used a multistage probability sampling design to gather a sample representative of U.S. children attending kindergarten beginning 1998. We made use of data collected in the kindergarten, 3rd grade, and 5th grade years. Classroom behavior was measured by a rating scale completed by teachers; participation in martial arts was assessed as part of a parent interview. The four possible combinations of participation and nonparticipation in martial arts at time 1 and time 2 for each analysis were coded into three dichotomous variables; the set of three variables constituted the measure of participation studied through regression. Multiple regression was used to estimate the association between martial arts participation and change in classroom behavior from one measurement occasion to the next. The change from kindergarten to third grade was studied as a function of martial arts participation, and the analysis was replicated studying behavior change from third grade to fifth grade. Cohen's f2 effect sizes were derived from these regressions. The martial arts variable failed to show a statistically significant effect on behavior, in either of the regression analyses; in fact, the f2 effect size for martial arts was 0.000 for both analyses. The 95% confidence intervals for regression coefficients for martial arts variables have upper and lower bounds that are all close to zero. The analyses not only fail to reject the null hypothesis, but also render unlikely a population effect size that differs greatly from zero. The data from the ECLS-K fail to support enrolling children in martial arts to improve mental health outcomes as measured by classroom teachers.
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
NASA Technical Reports Server (NTRS)
Hague, D. S.; Woodbury, N. W.
1975-01-01
The Mars system is a tool for rapid prediction of aircraft or engine characteristics based on correlation-regression analysis of past designs stored in the data bases. An example of output obtained from the MARS system, which involves derivation of an expression for gross weight of subsonic transport aircraft in terms of nine independent variables is given. The need is illustrated for careful selection of correlation variables and for continual review of the resulting estimation equations. For Vol. 1, see N76-10089.
TV watching, soap opera and happiness.
Lu, L; Argyle, M
1993-09-01
One hundred and fourteen subjects reported the amount of time they spent watching television in general, and soap opera in particular. They also completed scales measuring happiness and other personality variables, such as extraversion and cooperativeness. In the multiple regression analysis, having controlled for the demographic variables, watching TV was related to unhappiness, whereas watching soap opera was related to happiness. Discriminant analysis showed that females, higher happiness and extraversion distinguished regular soap watchers (who nevertheless watched little TV in general) from irregular soap watchers (who nevertheless watched a lot of TV in general).
Bayesian Group Bridge for Bi-level Variable Selection.
Mallick, Himel; Yi, Nengjun
2017-06-01
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Studying Canonical Analysis: A Reply to Thorndike's Comments
ERIC Educational Resources Information Center
Barcikowski, Robert S.; Stevens, James P.
1976-01-01
This article is a rejoinder to TM 502 249. Each of Thorndike's comments are examined. A possible solution to the large number of subjects necessary for stable weights and variate-variable correlations using ridge regression procedures is suggested. (RC)
Kepler AutoRegressive Planet Search
NASA Astrophysics Data System (ADS)
Feigelson, Eric
NASA's Kepler mission is the source of more exoplanets than any other instrument, but the discovery depends on complex statistical analysis procedures embedded in the Kepler pipeline. A particular challenge is mitigating irregular stellar variability without loss of sensitivity to faint periodic planetary transits. This proposal presents a two-stage alternative analysis procedure. First, parametric autoregressive ARFIMA models, commonly used in econometrics, remove most of the stellar variations. Second, a novel matched filter is used to create a periodogram from which transit-like periodicities are identified. This analysis procedure, the Kepler AutoRegressive Planet Search (KARPS), is confirming most of the Kepler Objects of Interest and is expected to identify additional planetary candidates. The proposed research will complete application of the KARPS methodology to the prime Kepler mission light curves of 200,000: stars, and compare the results with Kepler Objects of Interest obtained with the Kepler pipeline. We will then conduct a variety of astronomical studies based on the KARPS results. Important subsamples will be extracted including Habitable Zone planets, hot super-Earths, grazing-transit hot Jupiters, and multi-planet systems. Groundbased spectroscopy of poorly studied candidates will be performed to better characterize the host stars. Studies of stellar variability will then be pursued based on KARPS analysis. The autocorrelation function and nonstationarity measures will be used to identify spotted stars at different stages of autoregressive modeling. Periodic variables with folded light curves inconsistent with planetary transits will be identified; they may be eclipsing or mutually-illuminating binary star systems. Classification of stellar variables with KARPS-derived statistical properties will be attempted. KARPS procedures will then be applied to archived K2 data to identify planetary transits and characterize stellar variability.
Kim, Sun Mi; Han, Heon; Park, Jeong Mi; Choi, Yoon Jung; Yoon, Hoi Soo; Sohn, Jung Hee; Baek, Moon Hee; Kim, Yoon Nam; Chae, Young Moon; June, Jeon Jong; Lee, Jiwon; Jeon, Yong Hwan
2012-10-01
To determine which Breast Imaging Reporting and Data System (BI-RADS) descriptors for ultrasound are predictors for breast cancer using logistic regression (LR) analysis in conjunction with interobserver variability between breast radiologists, and to compare the performance of artificial neural network (ANN) and LR models in differentiation of benign and malignant breast masses. Five breast radiologists retrospectively reviewed 140 breast masses and described each lesion using BI-RADS lexicon and categorized final assessments. Interobserver agreements between the observers were measured by kappa statistics. The radiologists' responses for BI-RADS were pooled. The data were divided randomly into train (n = 70) and test sets (n = 70). Using train set, optimal independent variables were determined by using LR analysis with forward stepwise selection. The LR and ANN models were constructed with the optimal independent variables and the biopsy results as dependent variable. Performances of the models and radiologists were evaluated on the test set using receiver-operating characteristic (ROC) analysis. Among BI-RADS descriptors, margin and boundary were determined as the predictors according to stepwise LR showing moderate interobserver agreement. Area under the ROC curves (AUC) for both of LR and ANN were 0.87 (95% CI, 0.77-0.94). AUCs for the five radiologists ranged 0.79-0.91. There was no significant difference in AUC values among the LR, ANN, and radiologists (p > 0.05). Margin and boundary were found as statistically significant predictors with good interobserver agreement. Use of the LR and ANN showed similar performance to that of the radiologists for differentiation of benign and malignant breast masses.
Kepler AutoRegressive Planet Search: Motivation & Methodology
NASA Astrophysics Data System (ADS)
Caceres, Gabriel; Feigelson, Eric; Jogesh Babu, G.; Bahamonde, Natalia; Bertin, Karine; Christen, Alejandra; Curé, Michel; Meza, Cristian
2015-08-01
The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Auto-Regressive Moving-Average (ARMA) models, Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH), and related models are flexible, phenomenological methods used with great success to model stochastic temporal behaviors in many fields of study, particularly econometrics. Powerful statistical methods are implemented in the public statistical software environment R and its many packages. Modeling involves maximum likelihood fitting, model selection, and residual analysis. These techniques provide a useful framework to model stellar variability and are used in KARPS with the objective of reducing stellar noise to enhance opportunities to find as-yet-undiscovered planets. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; ARMA-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. We apply the procedures to simulated Kepler-like time series with known stellar and planetary signals to evaluate the effectiveness of the KARPS procedures. The ARMA-type modeling is effective at reducing stellar noise, but also reduces and transforms the transit signal into ingress/egress spikes. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. We also illustrate the efficient coding in R.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Lago-Ballesteros, Joaquin; Lago-Peñas, Carlos; Rey, Ezequiel
2012-01-01
The aim of this study was to analyse the influence of playing tactics, opponent interaction and situational variables on achieving score-box possessions in professional soccer. The sample was constituted by 908 possessions obtained by a team from the Spanish soccer league in 12 matches played during the 2009-2010 season. Multidimensional qualitative data obtained from 12 ordered categorical variables were used. Sampled matches were registered by the AMISCO PRO system. Data were analysed using chi-square analysis and multiple logistic regression analysis. Of 908 possessions, 303 (33.4%) produced score-box possessions, 477 (52.5%) achieved progression and 128 (14.1%) failed to reach any sort of progression. Multiple logistic regression showed that, for the main variable "team possession type", direct attacks and counterattacks were three times more effective than elaborate attacks for producing a score-box possession (P < 0.05). Team possession originating from the middle zones and playing against less than six defending players (P < 0.001) registered a higher success than those started in the defensive zone with a balanced defence. When the team was drawing or winning, the probability of reaching the score-box decreased by 43 and 53 percent, respectively, compared with the losing situation (P < 0.05). Accounting for opponent interactions and situational variables is critical to evaluate the effectiveness of offensive playing tactics on producing score-box possessions.
Schwab, Bianca; Daniel, Heloisa Silveira; Lutkemeyer, Carine; Neves, João Arthur Lange Lins; Zilli, Louise Nassif; Guarnieri, Ricardo; Diaz, Alexandre Paim; Michels, Ana Maria Maykot Prates
2015-01-01
Health-related quality of life (HRQOL) assessment tools have been broadly used in the medical context. These tools are used to measure the subjective impact of the disease on patients. The objective of this study was to evaluate the variables associated with HRQOL in a Brazilian sample of patients followed up in a tertiary outpatient clinic for depression and anxiety disorders. Cross-sectional study. Independent variables were those included in a sociodemographic questionnaire and the Hospital Anxiety and Depression Scale (HADS) scores. Dependent variables were those included in the short version of the World Health Organization Quality of Life (WHOQOL-BREF) and the scores for its subdomains (overall quality of life and general health, physical health, psychological health, social relationships, and environment). A multiple linear regression analysis was used to find the variables independently associated with each outcome. Seventy-five adult patients were evaluated. After multiple linear regression analysis, the HADS scores were associated with all outcomes, except social relationships (p = 0.08). Female gender was associated with poor total scores, as well as psychological health and environment. Unemployment was associated with poor physical health. Identifying the factors associated with HRQOL and recognizing that depression and anxiety are major factors are essential to improve the care of patients.
The repeatability of mean defect with size III and size V standard automated perimetry.
Wall, Michael; Doyle, Carrie K; Zamba, K D; Artes, Paul; Johnson, Chris A
2013-02-15
The mean defect (MD) of the visual field is a global statistical index used to monitor overall visual field change over time. Our goal was to investigate the relationship of MD and its variability for two clinically used strategies (Swedish Interactive Threshold Algorithm [SITA] standard size III and full threshold size V) in glaucoma patients and controls. We tested one eye, at random, for 46 glaucoma patients and 28 ocularly healthy subjects with Humphrey program 24-2 SITA standard for size III and full threshold for size V each five times over a 5-week period. The standard deviation of MD was regressed against the MD for the five repeated tests, and quantile regression was used to show the relationship of variability and MD. A Wilcoxon test was used to compare the standard deviations of the two testing methods following quantile regression. Both types of regression analysis showed increasing variability with increasing visual field damage. Quantile regression showed modestly smaller MD confidence limits. There was a 15% decrease in SD with size V in glaucoma patients (P = 0.10) and a 12% decrease in ocularly healthy subjects (P = 0.08). The repeatability of size V MD appears to be slightly better than size III SITA testing. When using MD to determine visual field progression, a change of 1.5 to 4 decibels (dB) is needed to be outside the normal 95% confidence limits, depending on the size of the stimulus and the amount of visual field damage.
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
Logic regression and its extensions.
Schwender, Holger; Ruczinski, Ingo
2010-01-01
Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Applications of MIDAS regression in analysing trends in water quality
NASA Astrophysics Data System (ADS)
Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.
2014-04-01
We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.
NASA Astrophysics Data System (ADS)
Varouchakis, Emmanouil; Kourgialas, Nektarios; Karatzas, George; Giannakis, Georgios; Lilli, Maria; Nikolaidis, Nikolaos
2014-05-01
Riverbank erosion affects the river morphology and the local habitat and results in riparian land loss, damage to property and infrastructures, ultimately weakening flood defences. An important issue concerning riverbank erosion is the identification of the areas vulnerable to erosion, as it allows for predicting changes and assists with stream management and restoration. One way to predict the vulnerable to erosion areas is to determine the erosion probability by identifying the underlying relations between riverbank erosion and the geomorphological and/or hydrological variables that prevent or stimulate erosion. A statistical model for evaluating the probability of erosion based on a series of independent local variables and by using logistic regression is developed in this work. The main variables affecting erosion are vegetation index (stability), the presence or absence of meanders, bank material (classification), stream power, bank height, river bank slope, riverbed slope, cross section width and water velocities (Luppi et al. 2009). In statistics, logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable, e.g. binary response, based on one or more predictor variables (continuous or categorical). The probabilities of the possible outcomes are modelled as a function of independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. 1 = "presence of erosion" and 0 = "no erosion") for any value of the independent variables. The regression coefficients are estimated by using maximum likelihood estimation. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested (Atkinson et al. 2003). The developed statistical model is applied to the Koiliaris River Basin in the island of Crete, Greece. The aim is to determine the probability of erosion along the Koiliaris' riverbanks considering a series of independent geomorphological and/or hydrological variables. Data for the river bank slope and for the river cross section width are available at ten locations along the river. The riverbank has indications of erosion at six of the ten locations while four has remained stable. Based on a recent work, measurements for the two independent variables and data regarding bank stability are available at eight different locations along the river. These locations were used as validation points for the proposed statistical model. The results show a very close agreement between the observed erosion indications and the statistical model as the probability of erosion was accurately predicted at seven out of the eight locations. The next step is to apply the model at more locations along the riverbanks. In November 2013, stakes were inserted at selected locations in order to be able to identify the presence or absence of erosion after the winter period. In April 2014 the presence or absence of erosion will be identified and the model results will be compared to the field data. Our intent is to extend the model by increasing the number of independent variables in order to indentify the key factors favouring erosion along the Koiliaris River. We aim at developing an easy to use statistical tool that will provide a quantified measure of the erosion probability along the riverbanks, which could consequently be used to prevent erosion and flooding events. Atkinson, P. M., German, S. E., Sear, D. A. and Clark, M. J. 2003. Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35 (1), 58-82. Luppi, L., Rinaldi, M., Teruggi, L. B., Darby, S. E. and Nardi, L. 2009. Monitoring and numerical modelling of riverbank erosion processes: A case study along the Cecina River (central Italy). Earth Surface Processes and Landforms, 34 (4), 530-546. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Exploring public databases to characterize urban flood risks in Amsterdam
NASA Astrophysics Data System (ADS)
Gaitan, Santiago; ten Veldhuis, Marie-claire; van de Giesen, Nick
2015-04-01
Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to decide upon investment to reduce their impacts. Obvious flooding factors affecting flood risk include sewer systems performance and urban topography. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall and socioeconomic characteristics may help to explain probability and impacts of urban flooding. Several public databases were analyzed: complaints about flooding made by citizens, rainfall depths (15 min and 100 Ha spatio-temporal resolution), grids describing number of inhabitants, income, and housing price (1Ha and 25Ha resolution); and buildings age. Data analysis was done using Python and GIS programming, and included spatial indexing of data, cluster analysis, and multivariate regression on the complaints. Complaints were used as a proxy to characterize flooding impacts. The cluster analysis, run for all the variables except the complaints, grouped part of the grid-cells of central Amsterdam into a highly differentiated group, covering 10% of the analyzed area, and accounting for 25% of registered complaints. The configuration of the analyzed variables in central Amsterdam coincides with a high complaint count. Remaining complaints were evenly dispersed along other groups. An adjusted R2 of 0.38 in the multivariate regression suggests that explaining power can improve if additional variables are considered. While rainfall intensity explained 4% of the incidence of complaints, population density and building age significantly explained around 20% each. Data mining of public databases proved to be a valuable tool to identify factors explaining variability in occurrence of urban pluvial flooding, though additional variables must be considered to fully explain flood risk variability.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Ramsthaler, Frank; Kettner, Mattias; Verhoff, Marcel A
2014-01-01
In forensic anthropological casework, estimating age-at-death is key to profiling unknown skeletal remains. The aim of this study was to examine the reliability of a new, simple, fast, and inexpensive digital odontological method for age-at-death estimation. The method is based on the original Lamendin method, which is a widely used technique in the repertoire of odontological aging methods in forensic anthropology. We examined 129 single root teeth employing a digital camera and imaging software for the measurement of the luminance of the teeth's translucent root zone. Variability in luminance detection was evaluated using statistical technical error of measurement analysis. The method revealed stable values largely unrelated to observer experience, whereas requisite formulas proved to be camera-specific and should therefore be generated for an individual recording setting based on samples of known chronological age. Multiple regression analysis showed a highly significant influence of the coefficients of the variables "arithmetic mean" and "standard deviation" of luminance for the regression formula. For the use of this primer multivariate equation for age-at-death estimation in casework, a standard error of the estimate of 6.51 years was calculated. Step-by-step reduction of the number of embedded variables to linear regression analysis employing the best contributor "arithmetic mean" of luminance yielded a regression equation with a standard error of 6.72 years (p < 0.001). The results of this study not only support the premise of root translucency as an age-related phenomenon, but also demonstrate that translucency reflects a number of other influencing factors in addition to age. This new digital measuring technique of the zone of dental root luminance can broaden the array of methods available for estimating chronological age, and furthermore facilitate measurement and age classification due to its low dependence on observer experience.
Sophocleous, M.
2000-01-01
A practical methodology for recharge characterization was developed based on several years of field-oriented research at 10 sites in the Great Bend Prairie of south-central Kansas. This methodology combines the soil-water budget on a storm-by-storm year-round basis with the resulting watertable rises. The estimated 1985-1992 average annual recharge was less than 50mm/year with a range from 15 mm/year (during the 1998 drought) to 178 mm/year (during the 1993 flood year). Most of this recharge occurs during the spring months. To regionalize these site-specific estimates, an additional methodology based on multiple (forward) regression analysis combined with classification and GIS overlay analyses was developed and implemented. The multiple regression analysis showed that the most influential variables were, in order of decreasing importance, total annual precipitation, average maximum springtime soil-profile water storage, average shallowest springtime depth to watertable, and average springtime precipitation rate. Therefore, four GIS (ARC/INFO) data "layers" or coverages were constructed for the study region based on these four variables, and each such coverage was classified into the same number of data classes to avoid biasing the results. The normalized regression coefficients were employed to weigh the class rankings of each recharge-affecting variable. This approach resulted in recharge zonations that agreed well with the site recharge estimates. During the "Great Flood of 1993," when rainfall totals exceeded normal levels by -200% in the northern portion of the study region, the developed regionalization methodology was tested against such extreme conditions, and proved to be both practical, based on readily available or easily measurable data, and robust. It was concluded that the combination of multiple regression and GIS overlay analyses is a powerful and practical approach to regionalizing small samples of recharge estimates.
A gentle introduction to quantile regression for ecologists
Cade, B.S.; Noon, B.R.
2003-01-01
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Screening and clustering of sparse regressions with finite non-Gaussian mixtures.
Zhang, Jian
2017-06-01
This article proposes a method to address the problem that can arise when covariates in a regression setting are not Gaussian, which may give rise to approximately mixture-distributed errors, or when a true mixture of regressions produced the data. The method begins with non-Gaussian mixture-based marginal variable screening, followed by fitting a full but relatively smaller mixture regression model to the selected data with help of a new penalization scheme. Under certain regularity conditions, the new screening procedure is shown to possess a sure screening property even when the population is heterogeneous. We further prove that there exists an elbow point in the associated scree plot which results in a consistent estimator of the set of active covariates in the model. By simulations, we demonstrate that the new procedure can substantially improve the performance of the existing procedures in the content of variable screening and data clustering. By applying the proposed procedure to motif data analysis in molecular biology, we demonstrate that the new method holds promise in practice. © 2016, The International Biometric Society.
Qiao, Yuanhua; Keren, Nir; Mannan, M Sam
2009-08-15
Risk assessment and management of transportation of hazardous materials (HazMat) require the estimation of accident frequency. This paper presents a methodology to estimate hazardous materials transportation accident frequency by utilizing publicly available databases and expert knowledge. The estimation process addresses route-dependent and route-independent variables. Negative binomial regression is applied to an analysis of the Department of Public Safety (DPS) accident database to derive basic accident frequency as a function of route-dependent variables, while the effects of route-independent variables are modeled by fuzzy logic. The integrated methodology provides the basis for an overall transportation risk analysis, which can be used later to develop a decision support system.
Metrics to Compare Aircraft Operating and Support Costs in the Department of Defense
2015-01-01
a phenomenon in regression analysis called multicollinear - ity, which makes problematic the interpretation of the coefficient esti- mates of highly...indicating a very high amount of multicollinearity and suggesting that the magnitude of the coefficients on those variables should be treated with caution... multicollinearity between these independent variables, one must be cautious when interpreting the statistical relationship between flying hours and cost. The
A comparison of two microscale laboratory reporting methods in a secondary chemistry classroom
NASA Astrophysics Data System (ADS)
Martinez, Lance Michael
This study attempted to determine if there was a difference between the laboratory achievement of students who used a modified reporting method and those who used traditional laboratory reporting. The study also determined the relationships between laboratory performance scores and the independent variables score on the Group Assessment of Logical Thinking (GALT) test, chronological age in months, gender, and ethnicity for each of the treatment groups. The study was conducted using 113 high school students who were enrolled in first-year general chemistry classes at Pueblo South High School in Colorado. The research design used was the quasi-experimental Nonequivalent Control Group Design. The statistical treatment consisted of the Multiple Regression Analysis and the Analysis of Covariance. Based on the GALT, students in the two groups were generally in the concrete and transitional stages of the Piagetian cognitive levels. The findings of the study revealed that the traditional and the modified methods of laboratory reporting did not have any effect on the laboratory performance outcome of the subjects. However, the students who used the traditional method of reporting showed a higher laboratory performance score when evaluation was conducted using the New Standards rubric recommended by the state. Multiple Regression Analysis revealed that there was a significant relationship between the criterion variable student laboratory performance outcome of individuals who employed traditional laboratory reporting methods and the composite set of predictor variables. On the contrary, there was no significant relationship between the criterion variable student laboratory performance outcome of individuals who employed modified laboratory reporting methods and the composite set of predictor variables.
Zhao, G; Li, S H; Tan, X
2016-03-01
To investigate the relationship between autonomic nervous function and arteriosclerosis in patients with essential hypertension. From January 2011 to December 2013, a total of 269 patients with essential hypertension hospitalized in Chang'an Branch of First People's Hospital of Liangshan were divided into normal PWV group (PWV<9 m/s, n=178) and high PWV group (PWV≥9 m/s, n=91) via the results of carotid-femoral pulse wave velocity (PWV). Synchronic 24 hours ambulatory blood pressure monitoring and dynamic electrocardiogram were performed for all participants to simultaneously monitor the heart rate variability (HRV) and blood pressure variability (BPV) in these patients. Pearson single factor analysis and multivariate logistic regression analysis were performed to define the relationship between PWV and HRV, BPV respectively. The level of nHR/dHR (index of heart rate variability), 24 hour'sSSD, dSSD, nSSD (indexes of blood pressure variability) increased significantly (all P<0.05), while the level of SDANN (index of heart rate variability) decreased significantly (P<0.05) in high PWV group compared with normal PWV group. Multiple linear regression analysis showed that PWV was positively correlated with 24 hour'sSSD, 24 hour'sPP, LF, LF/HF and night/day heart rate ratio (all P<0.05). HRV (LF, LF/HF, nHR/dHR) and BPV (24 hours'SSD, dSSD, nSSD) are positively correlated to arteriosclerosis in patients with essential hypertension. Our results show that sympathetic activation and vascular injury are closely related in patients with essential hypertension.
Relationship between Gender Roles and Sexual Assertiveness in Married Women.
Azmoude, Elham; Firoozi, Mahbobe; Sadeghi Sahebzad, Elahe; Asgharipour, Neghar
2016-10-01
Evidence indicates that sexual assertiveness is one of the important factors affecting sexual satisfaction. According to some studies, traditional gender norms conflict with women's capability in expressing sexual desires. This study examined the relationship between gender roles and sexual assertiveness in married women in Mashhad, Iran. This cross-sectional study was conducted on 120 women who referred to Mashhad health centers through convenient sampling in 2014-15. Data were collected using Bem Sex Role Inventory (BSRI) and Hulbert index of sexual assertiveness. Data were analyzed using SPSS 16 by Pearson and Spearman's correlation tests and linear Regression Analysis. The mean scores of sexual assertiveness was 54.93±13.20. According to the findings, there was non-significant correlation between Femininity and masculinity score with sexual assertiveness (P=0.069 and P=0.080 respectively). Linear regression analysis indicated that among the predictor variables, only Sexual function satisfaction was identified as the sexual assertiveness summary predictor variables (P=0.001). Based on the results, sexual assertiveness in married women does not comply with gender role, but it is related to Sexual function satisfaction. So, counseling psychologists need to consider this variable when designing intervention programs for modifying sexual assertiveness and find other variables that affect sexual assertiveness.
Relationship between Gender Roles and Sexual Assertiveness in Married Women
Azmoude, Elham; Firoozi, Mahbobe; Sadeghi Sahebzad, Elahe; Asgharipour, Neghar
2016-01-01
ABSTRACT Background: Evidence indicates that sexual assertiveness is one of the important factors affecting sexual satisfaction. According to some studies, traditional gender norms conflict with women’s capability in expressing sexual desires. This study examined the relationship between gender roles and sexual assertiveness in married women in Mashhad, Iran. Methods: This cross-sectional study was conducted on 120 women who referred to Mashhad health centers through convenient sampling in 2014-15. Data were collected using Bem Sex Role Inventory (BSRI) and Hulbert index of sexual assertiveness. Data were analyzed using SPSS 16 by Pearson and Spearman’s correlation tests and linear Regression Analysis. Results: The mean scores of sexual assertiveness was 54.93±13.20. According to the findings, there was non-significant correlation between Femininity and masculinity score with sexual assertiveness (P=0.069 and P=0.080 respectively). Linear regression analysis indicated that among the predictor variables, only Sexual function satisfaction was identified as the sexual assertiveness summary predictor variables (P=0.001). Conclusion: Based on the results, sexual assertiveness in married women does not comply with gender role, but it is related to Sexual function satisfaction. So, counseling psychologists need to consider this variable when designing intervention programs for modifying sexual assertiveness and find other variables that affect sexual assertiveness. PMID:27713899
NASA Astrophysics Data System (ADS)
Visser, H.; Molenaar, J.
1995-05-01
The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Air Pollutants, Climate, and the Prevalence of Pediatric Asthma in Urban Areas of China
Zhang, Juanjuan; Yan, Li; Fu, Wenlong; Yi, Jing; Chen, Yuzhi; Liu, Chuanhe; Xu, Dongqun; Wang, Qiang
2016-01-01
Background. Prevalence of childhood asthma varies significantly among regions, while its reasons are not clear yet with only a few studies reporting relevant causes for this variation. Objective. To investigate the potential role of city-average levels of air pollutants and climatic factors in order to distinguish differences in asthma prevalence in China and explain their reasons. Methods. Data pertaining to 10,777 asthmatic patients were obtained from the third nationwide survey of childhood asthma in China's urban areas. Annual mean concentrations of air pollutants and other climatic factors were obtained for the same period from several government departments. Data analysis was implemented with descriptive statistics, Pearson correlation coefficient, and multiple regression analysis. Results. Pearson correlation analysis showed that the situation of childhood asthma was strongly linked with SO2, relative humidity, and hours of sunshine (p < 0.05). Multiple regression analysis indicated that, among the predictor variables in the final step, SO2 was found to be the most powerful predictor variable amongst all (β = −19.572, p < 0.05). Furthermore, results had shown that hours of sunshine (β = −0.014, p < 0.05) was a significant component summary predictor variable. Conclusion. The findings of this study do not suggest that air pollutants or climate, at least in terms of children, plays a major role in explaining regional differences in asthma prevalence in China. PMID:27556031
[Obesity in Brazilian women: association with parity and socioeconomic status].
Ferreira, Regicely Aline Brandão; Benicio, Maria Helena D'Aquino
2015-05-01
To determine the influence of reproductive history on the prevalence of obesity in Brazilian women and the possible modifying effect of socioeconomic variables on the association between parity and excess weight. A retrospective analysis of complex sample data collected as part of the 2006 Brazilian National Survey on Demography and Health, which included a group representative of women of childbearing age in Brazil was conducted. The study included 11 961 women aged 20 to 49 years. The association between the study factor (parity) and the outcome of interest (obesity) was tested using logistic regression analysis. The adjusted effect of parity on obesity was assessed in a multiple regression model containing control variables: age, family purchasing power, as defined by the Brazilian Association of Research Enterprises (ABEP), schooling, and health care. Significance level was set at below 0.05. The prevalence of obesity in the study population was 18.6%. The effect of parity on obesity was significant (P for trend < 0.001). Unadjusted analysis showed a positive association of obesity with parity and age. Family purchase power had a significant odds ratio for obesity only in the unadjusted analysis. In the adjusted model, this variable did not explain obesity. The present findings suggest that parity has an influence on obesity in Brazilian women of childbearing age, with higher prevalence in women vs. without children.
Parametric Methods for Dynamic 11C-Phenytoin PET Studies.
Mansor, Syahir; Yaqub, Maqsood; Boellaard, Ronald; Froklage, Femke E; de Vries, Anke; Bakker, Esther D M; Voskuyl, Rob A; Eriksson, Jonas; Schwarte, Lothar A; Verbeek, Joost; Windhorst, Albert D; Lammertsma, Adriaan A
2017-03-01
In this study, the performance of various methods for generating quantitative parametric images of dynamic 11 C-phenytoin PET studies was evaluated. Methods: Double-baseline 60-min dynamic 11 C-phenytoin PET studies, including online arterial sampling, were acquired for 6 healthy subjects. Parametric images were generated using Logan plot analysis, a basis function method, and spectral analysis. Parametric distribution volume (V T ) and influx rate ( K 1 ) were compared with those obtained from nonlinear regression analysis of time-activity curves. In addition, global and regional test-retest (TRT) variability was determined for parametric K 1 and V T values. Results: Biases in V T observed with all parametric methods were less than 5%. For K 1 , spectral analysis showed a negative bias of 16%. The mean TRT variabilities of V T and K 1 were less than 10% for all methods. Shortening the scan duration to 45 min provided similar V T and K 1 with comparable TRT performance compared with 60-min data. Conclusion: Among the various parametric methods tested, the basis function method provided parametric V T and K 1 values with the least bias compared with nonlinear regression data and showed TRT variabilities lower than 5%, also for smaller volume-of-interest sizes (i.e., higher noise levels) and shorter scan duration. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
Koper, Olga Martyna; Kamińska, Joanna; Milewska, Anna; Sawicki, Karol; Mariak, Zenon; Kemona, Halina; Matowicka-Karna, Joanna
2018-05-18
The influence of isoform A of reticulon-4 (Nogo-A), also known as neurite outgrowth inhibitor, on primary brain tumor development was reported. Therefore the aim was the evaluation of Nogo-A concentrations in cerebrospinal fluid (CSF) and serum of brain tumor patients compared with non-tumoral individuals. All serum results, except for two cases, obtained both in brain tumors and non-tumoral individuals, were below the lower limit of ELISA detection. Cerebrospinal fluid Nogo-A concentrations were significantly lower in primary brain tumor patients compared to non-tumoral individuals. The univariate linear regression analysis found that if white blood cell count increases by 1 × 10 3 /μL, the mean cerebrospinal fluid Nogo-A concentration value decreases 1.12 times. In the model of multiple linear regression analysis predictor variables influencing cerebrospinal fluid Nogo-A concentrations included: diagnosis, sex, and sodium level. The mean cerebrospinal fluid Nogo-A concentration value was 1.9 times higher for women in comparison to men. In the astrocytic brain tumor group higher sodium level occurs with lower cerebrospinal fluid Nogo-A concentrations. We found the opposite situation in non-tumoral individuals. Univariate linear regression analysis revealed, that cerebrospinal fluid Nogo-A concentrations change in relation to white blood cell count. In the created model of multiple linear regression analysis we found, that within predictor variables influencing CSF Nogo-A concentrations were diagnosis, sex, and sodium level. Results may be relevant to the search for cerebrospinal fluid biomarkers and potential therapeutic targets in primary brain tumor patients. Nogo-A concentrations were tested by means of enzyme-linked immunosorbent assay (ELISA).
A rotor optimization using regression analysis
NASA Technical Reports Server (NTRS)
Giansante, N.
1984-01-01
The design and development of helicopter rotors is subject to the many design variables and their interactions that effect rotor operation. Until recently, selection of rotor design variables to achieve specified rotor operational qualities has been a costly, time consuming, repetitive task. For the past several years, Kaman Aerospace Corporation has successfully applied multiple linear regression analysis, coupled with optimization and sensitivity procedures, in the analytical design of rotor systems. It is concluded that approximating equations can be developed rapidly for a multiplicity of objective and constraint functions and optimizations can be performed in a rapid and cost effective manner; the number and/or range of design variables can be increased by expanding the data base and developing approximating functions to reflect the expanded design space; the order of the approximating equations can be expanded easily to improve correlation between analyzer results and the approximating equations; gradients of the approximating equations can be calculated easily and these gradients are smooth functions reducing the risk of numerical problems in the optimization; the use of approximating functions allows the problem to be started easily and rapidly from various initial designs to enhance the probability of finding a global optimum; and the approximating equations are independent of the analysis or optimization codes used.
Thematic Mapper Analysis of Blue Oak (Quercus douglasii) in Central California
Paul A. Lefebvre Jr.; Frank W. Davis; Mark Borchert
1991-01-01
Digital Thematic Mapper (TM) satellite data from September 1986 and December 1985 were analyzed to determine seasonal reflectance properties of blue oak rangeland in the La Panza mountains of San Luis Obispo County. Linear regression analysis was conducted to examine relationships between TM reflectance and oak canopy cover, basal area, and site topographic variables....
Economics of Education and Work Life Demand in Terms of Earnings and Skills
ERIC Educational Resources Information Center
Xia, Belle Selene; Liitiäinen, Elia
2014-01-01
This article uses data from a major international survey to construct earnings functions in terms of learning outcomes and variables related to working life in different European countries. In order to complement the extended earnings regression model, the authors have used partial correlation analysis and the analysis of covariance (ANCOVA) to…
NASA Technical Reports Server (NTRS)
Murphy, M. R.; Awe, C. A.
1986-01-01
Six professionally active, retired captains rated the coordination and decisionmaking performances of sixteen aircrews while viewing videotapes of a simulated commercial air transport operation. The scenario featured a required diversion and a probable minimum fuel situation. Seven point Likert-type scales were used in rating variables on the basis of a model of crew coordination and decisionmaking. The variables were based on concepts of, for example, decision difficulty, efficiency, and outcome quality; and leader-subordin ate concepts such as person and task-oriented leader behavior, and competency motivation of subordinate crewmembers. Five-front-end variables of the model were in turn dependent variables for a hierarchical regression procedure. The variance in safety performance was explained 46%, by decision efficiency, command reversal, and decision quality. The variance of decision quality, an alternative substantive dependent variable to safety performance, was explained 60% by decision efficiency and the captain's quality of within-crew communications. The variance of decision efficiency, crew coordination, and command reversal were in turn explained 78%, 80%, and 60% by small numbers of preceding independent variables. A principle component, varimax factor analysis supported the model structure suggested by regression analyses.
Drivers and potential predictability of summer time North Atlantic polar front jet variability
NASA Astrophysics Data System (ADS)
Hall, Richard J.; Jones, Julie M.; Hanna, Edward; Scaife, Adam A.; Erdélyi, Róbert
2017-06-01
The variability of the North Atlantic polar front jet stream is crucial in determining summer weather around the North Atlantic basin. Recent extreme summers in western Europe and North America have highlighted the need for greater understanding of this variability, in order to aid seasonal forecasting and mitigate societal, environmental and economic impacts. Here we find that simple linear regression and composite models based on a few predictable factors are able to explain up to 35 % of summertime jet stream speed and latitude variability from 1955 onwards. Sea surface temperature forcings impact predominantly on jet speed, whereas solar and cryospheric forcings appear to influence jet latitude. The cryospheric associations come from the previous autumn, suggesting the survival of an ice-induced signal through the winter season, whereas solar influences lead jet variability by a few years. Regression models covering the earlier part of the twentieth century are much less effective, presumably due to decreased availability of data, and increased uncertainty in observational reanalyses. Wavelet coherence analysis identifies that associations fluctuate over the study period but it is not clear whether this is just internal variability or genuine non-stationarity. Finally we identify areas for future research.
Comparison of Survival Models for Analyzing Prognostic Factors in Gastric Cancer Patients
Habibi, Danial; Rafiei, Mohammad; Chehrei, Ali; Shayan, Zahra; Tafaqodi, Soheil
2018-03-27
Objective: There are a number of models for determining risk factors for survival of patients with gastric cancer. This study was conducted to select the model showing the best fit with available data. Methods: Cox regression and parametric models (Exponential, Weibull, Gompertz, Log normal, Log logistic and Generalized Gamma) were utilized in unadjusted and adjusted forms to detect factors influencing mortality of patients. Comparisons were made with Akaike Information Criterion (AIC) by using STATA 13 and R 3.1.3 softwares. Results: The results of this study indicated that all parametric models outperform the Cox regression model. The Log normal, Log logistic and Generalized Gamma provided the best performance in terms of AIC values (179.2, 179.4 and 181.1, respectively). On unadjusted analysis, the results of the Cox regression and parametric models indicated stage, grade, largest diameter of metastatic nest, largest diameter of LM, number of involved lymph nodes and the largest ratio of metastatic nests to lymph nodes, to be variables influencing the survival of patients with gastric cancer. On adjusted analysis, according to the best model (log normal), grade was found as the significant variable. Conclusion: The results suggested that all parametric models outperform the Cox model. The log normal model provides the best fit and is a good substitute for Cox regression. Creative Commons Attribution License
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
NASA Astrophysics Data System (ADS)
Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua
2017-10-01
Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.
The influence of climate variables on dengue in Singapore.
Pinto, Edna; Coelho, Micheline; Oliver, Leuda; Massad, Eduardo
2011-12-01
In this work we correlated dengue cases with climatic variables for the city of Singapore. This was done through a Poisson Regression Model (PRM) that considers dengue cases as the dependent variable and the climatic variables (rainfall, maximum and minimum temperature and relative humidity) as independent variables. We also used Principal Components Analysis (PCA) to choose the variables that influence in the increase of the number of dengue cases in Singapore, where PC₁ (Principal component 1) is represented by temperature and rainfall and PC₂ (Principal component 2) is represented by relative humidity. We calculated the probability of occurrence of new cases of dengue and the relative risk of occurrence of dengue cases influenced by climatic variable. The months from July to September showed the highest probabilities of the occurrence of new cases of the disease throughout the year. This was based on an analysis of time series of maximum and minimum temperature. An interesting result was that for every 2-10°C of variation of the maximum temperature, there was an average increase of 22.2-184.6% in the number of dengue cases. For the minimum temperature, we observed that for the same variation, there was an average increase of 26.1-230.3% in the number of the dengue cases from April to August. The precipitation and the relative humidity, after analysis of correlation, were discarded in the use of Poisson Regression Model because they did not present good correlation with the dengue cases. Additionally, the relative risk of the occurrence of the cases of the disease under the influence of the variation of temperature was from 1.2-2.8 for maximum temperature and increased from 1.3-3.3 for minimum temperature. Therefore, the variable temperature (maximum and minimum) was the best predictor for the increased number of dengue cases in Singapore.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado Junior, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95% was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α<5%. From 226 women included, 200 (88.5%) were 20 to 44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. Copyright © 2015 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
NASA Astrophysics Data System (ADS)
Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah
2017-08-01
In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Prediction equations for maximal respiratory pressures of Brazilian adolescents.
Mendes, Raquel E F; Campos, Tania F; Macêdo, Thalita M F; Borja, Raíssa O; Parreira, Verônica F; Mendonça, Karla M P P
2013-01-01
The literature emphasizes the need for studies to provide reference values and equations able to predict respiratory muscle strength of Brazilian subjects at different ages and from different regions of Brazil. To develop prediction equations for maximal respiratory pressures (MRP) of Brazilian adolescents. In total, 182 healthy adolescents (98 boys and 84 girls) aged between 12 and 18 years, enrolled in public and private schools in the city of Natal-RN, were evaluated using an MVD300 digital manometer (Globalmed®) according to a standardized protocol. Statistical analysis was performed using SPSS Statistics 17.0 software, with a significance level of 5%. Data normality was verified using the Kolmogorov-Smirnov test, and descriptive analysis results were expressed as the mean and standard deviation. To verify the correlation between the MRP and the independent variables (age, weight, height and sex), the Pearson correlation test was used. To obtain the prediction equations, stepwise multiple linear regression was used. The variables height, weight and sex were correlated to MRP. However, weight and sex explained part of the variability of MRP, and the regression analysis in this study indicated that these variables contributed significantly in predicting maximal inspiratory pressure, and only sex contributed significantly to maximal expiratory pressure. This study provides reference values and two models of prediction equations for maximal inspiratory and expiratory pressures and sets the necessary normal lower limits for the assessment of the respiratory muscle strength of Brazilian adolescents.
Variables influencing allocation of capital expenditure in Indonesia
NASA Astrophysics Data System (ADS)
Muda, Iskandar; Naibaho, Revmianson
2018-03-01
The purpose of this study is to examine the factors affecting capital expenditure in Indonesia. The independent variables used are The Effects of Financing Surplus, Total Population and Regional Sizes and the dependent variable used is The Effects of Financing Surplus. This type of research is a causal associative research. The type of data used is secondary data in severals provinces in Indonesia with multiple regression analysis. The results show significantly the determinants of capital expenditure allocation in Indonesia are affected by Financing Surplus, Total Population and Regional Sizes.
Stanworth, R D; Akhtar, S; Channer, K S; Jones, T H
2014-02-01
The TIMES2 (testosterone replacement in hypogonadal men with either metabolic syndrome or type 2 diabetes) study reported beneficial effects of testosterone replacement therapy (TRT) on insulin resistance and other variables in men with diabetes or metabolic syndrome. The androgen receptor CAG repeat polymorphism (AR CAG) is known to affect stimulated AR activity and has been linked to various clinically relevant variables. To assess the role of AR CAG in the alteration of clinical response to TRT in the TIMES2 study. Subgroup analysis from a multicentre, randomised, double-blind, placebo-controlled and parallel group study. Outpatient study recruiting from secondary and primary care. A total of 139 men with hypogonadism and type 2 diabetes or metabolic syndrome, of which 73 received testosterone during the TIMES2 study. Testosterone 2% transdermal gel vs placebo. Regression coefficient of AR CAG from linear regression models for each variable. AR CAG was independently positively associated with change in fasting insulin, triglycerides and diastolic blood pressure during TRT with a trend to association with HOMA-IR - the primary outcome variable. There was a trend to negative association between AR CAG and change in PSA. There was no association of AR CAG with change in other glycaemic variables, other lipid variables or obesity. AR CAG affected the response of some variables to TRT in the TIMES2 study, although the association with HOMA-IR did not reach significance. Various factors may have limited the power of our study to detect the significant associations between AR CAG, testosterone levels and change in variables with testosterone treatment. Analysis of similar data sets from other clinical trials is warranted.
NASA Astrophysics Data System (ADS)
Mulyani, Sri; Andriyana, Yudhie; Sudartianto
2017-03-01
Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.
Robust Variable Selection with Exponential Squared Loss.
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-04-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.
Robust Variable Selection with Exponential Squared Loss
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-01-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are n-consistent and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods. PMID:23913996
NASA Astrophysics Data System (ADS)
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Kono, Kenichi; Nishida, Yusuke; Moriyama, Yoshihumi; Taoka, Masahiro; Sato, Takashi
2015-06-01
The assessment of nutritional states using fat free mass (FFM) measured with near-infrared spectroscopy (NIRS) is clinically useful. This measurement should incorporate the patient's post-dialysis weight ("dry weight"), in order to exclude the effects of any change in water mass. We therefore used NIRS to investigate the regression, independent variables, and absolute reliability of FFM in dry weight. The study included 47 outpatients from the hemodialysis unit. Body weight was measured before dialysis, and FFM was measured using NIRS before and after dialysis treatment. Multiple regression analysis was used to estimate the FFM in dry weight as the dependent variable. The measured FFM before dialysis treatment (Mw-FFM), and the difference between measured and dry weight (Mw-Dw) were independent variables. We performed Bland-Altman analysis to detect errors between the statistically estimated FFM and the measured FFM after dialysis treatment. The multiple regression equation to estimate the FFM in dry weight was: Dw-FFM = 0.038 + (0.984 × Mw-FFM) + (-0.571 × [Mw-Dw]); R(2) = 0.99). There was no systematic bias between the estimated and the measured values of FFM in dry weight. Using NIRS, FFM in dry weight can be calculated by an equation including FFM in measured weight and the difference between the measured weight and the dry weight. © 2015 The Authors. Therapeutic Apheresis and Dialysis © 2015 International Society for Apheresis.
Intraurban Differences in the Use of Ambulatory Health Services in a Large Brazilian City
Lima-Costa, Maria Fernanda; Proietti, Fernando Augusto; Cesar, Cibele C.; Macinko, James
2010-01-01
A major goal of health systems is to reduce inequities in access to services, that is, to ensure that health care is provided based on health needs rather than social or economic factors. This study aims to identify the determinants of health services utilization among adults in a large Brazilian city and intraurban disparities in health care use. We combine household survey data with census-derived classification of social vulnerability of each household’s census tract. The dependent variable was utilization of physician services in the prior 12 months, and the independent variables included predisposing factors, health needs, enabling factors, and context. Prevalence ratios and 95% confidence intervals were estimated by the Hurdle regression model, which combined Poisson regression analysis of factors associated with any doctor visits (dichotomous variable) and zero-truncated negative binomial regression for the analysis of factors associated with the number of visits among those who had at least one. Results indicate that the use of health services was greater among women and increased with age, and was determined primarily by health needs and whether the individual had a regular doctor, even among those living in areas of the city with the worst socio-environmental indicators. The experience of Belo Horizonte may have implications for other world cities, particularly in the development and use of a comprehensive index to identify populations at risk and in order to guide expansion of primary health care services as a means of enhancing equity in health. PMID:21104332
Teixeira, Juliana Araujo; Baggio, Maria Luiza; Fisberg, Regina Mara; Marchioni, Dirce Maria Lobo
2010-12-01
The objective of this study was to estimate the regressions calibration for the dietary data that were measured using the quantitative food frequency questionnaire (QFFQ) in the Natural History of HPV Infection in Men: the HIM Study in Brazil. A sample of 98 individuals from the HIM study answered one QFFQ and three 24-hour recalls (24HR) at interviews. The calibration was performed using linear regression analysis in which the 24HR was the dependent variable and the QFFQ was the independent variable. Age, body mass index, physical activity, income and schooling were used as adjustment variables in the models. The geometric means between the 24HR and the calibration-corrected QFFQ were statistically equal. The dispersion graphs between the instruments demonstrate increased correlation after making the correction, although there is greater dispersion of the points with worse explanatory power of the models. Identification of the regressions calibration for the dietary data of the HIM study will make it possible to estimate the effect of the diet on HPV infection, corrected for the measurement error of the QFFQ.
NASA Astrophysics Data System (ADS)
Hadley, Brian Christopher
This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozaki, Toshiro, E-mail: ganronbun@amail.plala.or.jp; Seki, Hiroshi; Shiina, Makoto
2009-09-15
The purpose of the present study was to elucidate a method for predicting the intrahepatic arteriovenous shunt rate from computed tomography (CT) images and biochemical data, instead of from arterial perfusion scintigraphy, because adverse exacerbated systemic effects may be induced in cases where a high shunt rate exists. CT and arterial perfusion scintigraphy were performed in patients with liver metastases from gastric or colorectal cancer. Biochemical data and tumor marker levels of 33 enrolled patients were measured. The results were statistically verified by multiple regression analysis. The total metastatic hepatic tumor volume (V{sub metastasized}), residual hepatic parenchyma volume (V{sub residual};more » calculated from CT images), and biochemical data were treated as independent variables; the intrahepatic arteriovenous (IHAV) shunt rate (calculated from scintigraphy) was treated as a dependent variable. The IHAV shunt rate was 15.1 {+-} 11.9%. Based on the correlation matrixes, the best correlation coefficient of 0.84 was established between the IHAV shunt rate and V{sub metastasized} (p < 0.01). In the multiple regression analysis with the IHAV shunt rate as the dependent variable, the coefficient of determination (R{sup 2}) was 0.75, which was significant at the 0.1% level with two significant independent variables (V{sub metastasized} and V{sub residual}). The standardized regression coefficients ({beta}) of V{sub metastasized} and V{sub residual} were significant at the 0.1 and 5% levels, respectively. Based on this result, we can obtain a predicted value of IHAV shunt rate (p < 0.001) using CT images. When a high shunt rate was predicted, beneficial and consistent clinical monitoring can be initiated in, for example, hepatic arterial infusion chemotherapy.« less
Kim, Min-Uk; Moon, Kyong Whan; Sohn, Jong-Ryeul; Byeon, Sang-Hoon
2018-05-18
We studied sensitive weather variables for consequence analysis, in the case of chemical leaks on the user side of offsite consequence analysis (OCA) tools. We used OCA tools Korea Offsite Risk Assessment (KORA) and Areal Location of Hazardous Atmospheres (ALOHA) in South Korea and the United States, respectively. The chemicals used for this analysis were 28% ammonia (NH₃), 35% hydrogen chloride (HCl), 50% hydrofluoric acid (HF), and 69% nitric acid (HNO₃). The accident scenarios were based on leakage accidents in storage tanks. The weather variables were air temperature, wind speed, humidity, and atmospheric stability. Sensitivity analysis was performed using the Statistical Package for the Social Sciences (SPSS) program for dummy regression analysis. Sensitivity analysis showed that impact distance was not sensitive to humidity. Impact distance was most sensitive to atmospheric stability, and was also more sensitive to air temperature than wind speed, according to both the KORA and ALOHA tools. Moreover, the weather variables were more sensitive in rural conditions than in urban conditions, with the ALOHA tool being more influenced by weather variables than the KORA tool. Therefore, if using the ALOHA tool instead of the KORA tool in rural conditions, users should be careful not to cause any differences in impact distance due to input errors of weather variables, with the most sensitive one being atmospheric stability.
Shifts of environmental and phytoplankton variables in a regulated river: A spatial-driven analysis.
Sabater-Liesa, Laia; Ginebreda, Antoni; Barceló, Damià
2018-06-18
The longitudinal structure of the environmental and phytoplankton variables was investigated in the Ebro River (NE Spain), which is heavily affected by water abstraction and regulation. A first exploration indicated that the phytoplankton community did not resist the impact of reservoirs and barely recovered downstream of them. The spatial analysis showed that the responses of the phytoplankton and environmental variables were not uniform. The two set of variables revealed spatial variability discontinuities and river fragmentation upstream and downstream from the reservoirs. Reservoirs caused the replacement of spatially heterogeneous habitats by homogeneous spatially distributed water bodies, these new environmental conditions downstream benefiting the opportunist and cosmopolitan algal taxa. The application of a spatial auto-regression model to algal biomass (chlorophyll-a) permitted to capture the relevance and contribution of extra-local influences in the river ecosystem. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Demand analysis of flood insurance by using logistic regression model and genetic algorithm
NASA Astrophysics Data System (ADS)
Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.
2018-03-01
Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.
Classification of Dust Days by Satellite Remotely Sensed Aerosol Products
NASA Technical Reports Server (NTRS)
Sorek-Hammer, M.; Cohen, A.; Levy, Robert C.; Ziv, B.; Broday, D. M.
2013-01-01
Considerable progress in satellite remote sensing (SRS) of dust particles has been seen in the last decade. From an environmental health perspective, such an event detection, after linking it to ground particulate matter (PM) concentrations, can proxy acute exposure to respirable particles of certain properties (i.e. size, composition, and toxicity). Being affected considerably by atmospheric dust, previous studies in the Eastern Mediterranean, and in Israel in particular, have focused on mechanistic and synoptic prediction, classification, and characterization of dust events. In particular, a scheme for identifying dust days (DD) in Israel based on ground PM10 (particulate matter of size smaller than 10 nm) measurements has been suggested, which has been validated by compositional analysis. This scheme requires information regarding ground PM10 levels, which is naturally limited in places with sparse ground-monitoring coverage. In such cases, SRS may be an efficient and cost-effective alternative to ground measurements. This work demonstrates a new model for identifying DD and non-DD (NDD) over Israel based on an integration of aerosol products from different satellite platforms (Moderate Resolution Imaging Spectroradiometer (MODIS) and Ozone Monitoring Instrument (OMI)). Analysis of ground-monitoring data from 2007 to 2008 in southern Israel revealed 67 DD, with more than 88 percent occurring during winter and spring. A Classification and Regression Tree (CART) model that was applied to a database containing ground monitoring (the dependent variable) and SRS aerosol product (the independent variables) records revealed an optimal set of binary variables for the identification of DD. These variables are combinations of the following primary variables: the calendar month, ground-level relative humidity (RH), the aerosol optical depth (AOD) from MODIS, and the aerosol absorbing index (AAI) from OMI. A logistic regression that uses these variables, coded as binary variables, demonstrated 93.2 percent correct classifications of DD and NDD. Evaluation of the combined CART-logistic regression scheme in an adjacent geographical region (Gush Dan) demonstrated good results. Using SRS aerosol products for DD and NDD, identification may enable us to distinguish between health, ecological, and environmental effects that result from exposure to these distinct particle populations.
Estimating the causes of traffic accidents using logistic regression and discriminant analysis.
Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu
2014-01-01
Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.
Howard, Elizabeth J; Harville, Emily; Kissinger, Patricia; Xiong, Xu
2013-07-01
There is growing interest in the application of propensity scores (PS) in epidemiologic studies, especially within the field of reproductive epidemiology. This retrospective cohort study assesses the impact of a short interpregnancy interval (IPI) on preterm birth and compares the results of the conventional logistic regression analysis with analyses utilizing a PS. The study included 96,378 singleton infants from Louisiana birth certificate data (1995-2007). Five regression models designed for methods comparison are presented. Ten percent (10.17 %) of all births were preterm; 26.83 % of births were from a short IPI. The PS-adjusted model produced a more conservative estimate of the exposure variable compared to the conventional logistic regression method (β-coefficient: 0.21 vs. 0.43), as well as a smaller standard error (0.024 vs. 0.028), odds ratio and 95 % confidence intervals [1.15 (1.09, 1.20) vs. 1.23 (1.17, 1.30)]. The inclusion of more covariate and interaction terms in the PS did not change the estimates of the exposure variable. This analysis indicates that PS-adjusted regression may be appropriate for validation of conventional methods in a large dataset with a fairly common outcome. PS's may be beneficial in producing more precise estimates, especially for models with many confounders and effect modifiers and where conventional adjustment with logistic regression is unsatisfactory. Short intervals between pregnancies are associated with preterm birth in this population, according to either technique. Birth spacing is an issue that women have some control over. Educational interventions, including birth control, should be applied during prenatal visits and following delivery.
Seasonal and Regional Variability in North Pacific Upper-Ocean Turbulence
NASA Astrophysics Data System (ADS)
Najjar, R.; Creedon, R.; Cronin, M. F.
2016-02-01
Turbulent diffusion at marine mixed layer base (MLB) plays a fundamental role in the transport of energy between the upper and abyssal ocean. Recent investigations of North Pacific mooring data at Ocean Climate Stations (OCS) Papa (50.1N,144.9W) and KEO (32.3N,144.6E) suggest seasonal and regional variability in thermal diffusivity (κT). In this investigation, it is hypothesized that these observed differences in κT are directly associated with synoptic variability in net surface heat flux (Q0), surface wind stress (τ), mixed layer depth (h), and density stratification at MLB (∂zσ|-h). To test this hypothesis, daily-averaged time series of κT are regressed against those of Q0, τ, h, and ∂zσ|-h at both Papa and KEO over a six year time period (2007-2013). Seasonality of each time series is removed before regression to capture synoptic variability of each variable. Preliminary results of the regression analysis suggest statistically significant correlations between κT and all forcing parameters at both mooring sites. These correlations have well-determined orders of magnitude and signs consistent with the hypothesis. As a result, differences in κT between Papa and KEO may be recast in terms of differences in their correlation coefficients. In order to continue investigation of these parameters and their effects on mean seasonal differences between the two regions, these results will be compared with turbulence predicted by the K-Profile Parameterization ocean turbulence model.
Explicit criteria for prioritization of cataract surgery
Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia
2006-01-01
Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893
Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward
2017-08-01
Urban air quality serves as an important function of the quality of urban life. Land use regression (LUR) modelling of air quality is essential for conducting health impacts assessment but more challenging in mountainous high-density urban scenario due to the complexities of the urban environment. In this study, a total of 21 LUR models are developed for seven kinds of air pollutants (gaseous air pollutants CO, NO 2 , NO x , O 3 , SO 2 and particulate air pollutants PM 2.5 , PM 10 ) with reference to three different time periods (summertime, wintertime and annual average of 5-year long-term hourly monitoring data from local air quality monitoring network) in Hong Kong. Under the mountainous high-density urban scenario, we improved the traditional LUR modelling method by incorporating wind availability information into LUR modelling based on surface geomorphometrical analysis. As a result, 269 independent variables were examined to develop the LUR models by using the "ADDRESS" independent variable selection method and stepwise multiple linear regression (MLR). Cross validation has been performed for each resultant model. The results show that wind-related variables are included in most of the resultant models as statistically significant independent variables. Compared with the traditional method, a maximum increase of 20% was achieved in the prediction performance of annual averaged NO 2 concentration level by incorporating wind-related variables into LUR model development. Copyright © 2017 Elsevier Inc. All rights reserved.
Serum Irisin Predicts Mortality Risk in Acute Heart Failure Patients.
Shen, Shutong; Gao, Rongrong; Bei, Yihua; Li, Jin; Zhang, Haifeng; Zhou, Yanli; Yao, Wenming; Xu, Dongjie; Zhou, Fang; Jin, Mengchao; Wei, Siqi; Wang, Kai; Xu, Xuejuan; Li, Yongqin; Xiao, Junjie; Li, Xinli
2017-01-01
Irisin is a peptide hormone cleaved from a plasma membrane protein fibronectin type III domain containing protein 5 (FNDC5). Emerging studies have indicated association between serum irisin and many major chronic diseases including cardiovascular diseases. However, the role of serum irisin as a predictor for mortality risk in acute heart failure (AHF) patients is not clear. AHF patients were enrolled and serum was collected at the admission and all patients were followed up for 1 year. Enzyme-linked immunosorbent assay was used to measure serum irisin levels. To explore predictors for AHF mortality, the univariate and multivariate logistic regression analysis, and receiver-operator characteristic (ROC) curve analysis were used. To determine the role of serum irisin levels in predicting survival, Kaplan-Meier survival analysis was used. In this study, 161 AHF patients were enrolled and serum irisin level was found to be significantly higher in patients deceased in 1-year follow-up. The univariate logistic regression analysis identified 18 variables associated with all-cause mortality in AHF patients, while the multivariate logistic regression analysis identified 2 variables namely blood urea nitrogen and serum irisin. ROC curve analysis indicated that blood urea nitrogen and the most commonly used biomarker, NT-pro-BNP, displayed poor prognostic value for AHF (AUCs ≤ 0.700) compared to serum irisin (AUC = 0.753). Kaplan-Meier survival analysis demonstrated that AHF patients with higher serum irisin had significantly higher mortality (P<0.001). Collectively, our study identified serum irisin as a predictive biomarker for 1-year all-cause mortality in AHF patients though large multicenter studies are highly needed. © 2017 The Author(s). Published by S. Karger AG, Basel.
Effect of duration of denervation on outcomes of ansa-recurrent laryngeal nerve reinnervation.
Li, Meng; Chen, Shicai; Wang, Wei; Chen, Donghui; Zhu, Minhui; Liu, Fei; Zhang, Caiyun; Li, Yan; Zheng, Hongliang
2014-08-01
To investigate the efficacy of laryngeal reinnervation with ansa cervicalis among unilateral vocal fold paralysis (UVFP) patients with different denervation durations. We retrospectively reviewed 349 consecutive UVFP cases of delayed ansa cervicalis to the recurrent laryngeal nerve (RLN) anastomosis. Potential influencing factors were analyzed in multivariable logistic regression analysis. Stratification analysis performed was aimed at one of the identified significant variables: denervation duration. Videostroboscopy, perceptual evaluation, acoustic analysis, maximum phonation time (MPT), and laryngeal electromyography (EMG) were performed preoperatively and postoperatively. Gender, age, preoperative EMG status and denervation duration were analyzed in multivariable logistic regression analysis. Stratification analysis was performed on denervation duration, which was divided into three groups according to the interval between RLN injury and reinnervation: group A, 6 to 12 months; group B, 12 to 24 months; and group C, > 24 months. Age, preoperative EMG, and denervation duration were identified as significant variables in multivariable logistic regression analysis. Stratification analysis on denervation duration showed significant differences between group A and C and between group B and C (P < 0.05)-but showed no significant difference between group A and B (P > 0.05) with regard to parameters overall grade, jitter, shimmer, noise-to-harmonics ratio, MPT, and postoperative EMG. In addition, videostroboscopic and laryngeal EMG data, perceptual and acoustic parameters, and MPT values were significantly improved postoperatively in each denervation duration group (P < 0.01). Although delayed laryngeal reinnervation is proved valid for UVFP, surgical outcome is better if the procedure is performed within 2 years after nerve injury than that over 2 years. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.
Comparative analysis on the probability of being a good payer
NASA Astrophysics Data System (ADS)
Mihova, V.; Pavlov, V.
2017-10-01
Credit risk assessment is crucial for the bank industry. The current practice uses various approaches for the calculation of credit risk. The core of these approaches is the use of multiple regression models, applied in order to assess the risk associated with the approval of people applying for certain products (loans, credit cards, etc.). Based on data from the past, these models try to predict what will happen in the future. Different data requires different type of models. This work studies the causal link between the conduct of an applicant upon payment of the loan and the data that he completed at the time of application. A database of 100 borrowers from a commercial bank is used for the purposes of the study. The available data includes information from the time of application and credit history while paying off the loan. Customers are divided into two groups, based on the credit history: Good and Bad payers. Linear and logistic regression are applied in parallel to the data in order to estimate the probability of being good for new borrowers. A variable, which contains value of 1 for Good borrowers and value of 0 for Bad candidates, is modeled as a dependent variable. To decide which of the variables listed in the database should be used in the modelling process (as independent variables), a correlation analysis is made. Due to the results of it, several combinations of independent variables are tested as initial models - both with linear and logistic regression. The best linear and logistic models are obtained after initial transformation of the data and following a set of standard and robust statistical criteria. A comparative analysis between the two final models is made and scorecards are obtained from both models to assess new customers at the time of application. A cut-off level of points, bellow which to reject the applications and above it - to accept them, has been suggested for both the models, applying the strategy to keep the same Accept Rate as in the current data.
Use of streamflow data to estimate base flowground-water recharge for Wisconsin
Gebert, W.A.; Radloff, M.J.; Considine, E.J.; Kennedy, J.L.
2007-01-01
The average annual base flow/recharge was determined for streamflow-gaging stations throughout Wisconsin by base-flow separation. A map of the State was prepared that shows the average annual base flow for the period 1970-99 for watersheds at 118 gaging stations. Trend analysis was performed on 22 of the 118 streamflow-gaging stations that had long-term records, unregulated flow, and provided aerial coverage of the State. The analysis found that a statistically significant increasing trend was occurring for watersheds where the primary land use was agriculture. Most gaging stations where the land cover was forest had no significant trend. A method to estimate the average annual base flow at ungaged sites was developed by multiple-regression analysis using basin characteristics. The equation with the lowest standard error of estimate, 9.5%, has drainage area, soil infiltration and base flow factor as independent variables. To determine the average annual base flow for smaller watersheds, estimates were made at low-flow partial-record stations in 3 of the 12 major river basins in Wisconsin. Regression equations were developed for each of the three major river basins using basin characteristics. Drainage area, soil infiltration, basin storage and base-flow factor were the independent variables in the regression equations with the lowest standard error of estimate. The standard error of estimate ranged from 17% to 52% for the three river basins. ?? 2007 American Water Resources Association.
77 FR 3121 - Program Integrity: Gainful Employment-Debt Measures; Correction
Federal Register 2010, 2011, 2012, 2013, 2014
2012-01-23
...On June 13, 2011, the Secretary of Education (Secretary) published a notice of final regulations in the Federal Register for Program Integrity: Gainful Employment--Debt Measures (Gainful Employment--Debt Measures) (76 FR 34386). In the preamble of the final regulations, we used the wrong data to calculate the percent of total variance in institutions' repayment rates that may be explained by race/ethnicity. Our intent was to use the data that included all minority students per institution. However, we mistakenly used the data for a subset of minority students per institution. We have now recalculated the total variance using the data that includes all minority students. Through this document, we correct, in the preamble of the Gainful Employment--Debt Measures final regulations, the errors resulting from this misapplication. We do not change the regression analysis model itself; we are using the same model with the appropriate data. Through this notice we also correct, in the preamble of the Gainful Employment--Debt Measures final regulations, our description of one component of the regression analysis. The preamble referred to use of an institutional variable measuring acceptance rates. This description was incorrect; in fact we used an institutional variable measuring retention rates. Correcting this language does not change the regression analysis model itself or the variance explained by the model. The text of the final regulations remains unchanged.
NASA Astrophysics Data System (ADS)
Wellen, Christopher; Arhonditsis, George B.; Labencki, Tanya; Boyd, Duncan
2012-10-01
Regression-type, hybrid empirical/process-based models (e.g., SPARROW, PolFlow) have assumed a prominent role in efforts to estimate the sources and transport of nutrient pollution at river basin scales. However, almost no attempts have been made to explicitly accommodate interannual nutrient loading variability in their structure, despite empirical and theoretical evidence indicating that the associated source/sink processes are quite variable at annual timescales. In this study, we present two methodological approaches to accommodate interannual variability with the Spatially Referenced Regressions on Watershed attributes (SPARROW) nonlinear regression model. The first strategy uses the SPARROW model to estimate a static baseline load and climatic variables (e.g., precipitation) to drive the interannual variability. The second approach allows the source/sink processes within the SPARROW model to vary at annual timescales using dynamic parameter estimation techniques akin to those used in dynamic linear models. Model parameterization is founded upon Bayesian inference techniques that explicitly consider calibration data and model uncertainty. Our case study is the Hamilton Harbor watershed, a mixed agricultural and urban residential area located at the western end of Lake Ontario, Canada. Our analysis suggests that dynamic parameter estimation is the more parsimonious of the two strategies tested and can offer insights into the temporal structural changes associated with watershed functioning. Consistent with empirical and theoretical work, model estimated annual in-stream attenuation rates varied inversely with annual discharge. Estimated phosphorus source areas were concentrated near the receiving water body during years of high in-stream attenuation and dispersed along the main stems of the streams during years of low attenuation, suggesting that nutrient source areas are subject to interannual variability.
ERIC Educational Resources Information Center
Woolley, Kristin K.
Many researchers are unfamiliar with suppressor variables and how they operate in multiple regression analyses. This paper describes the role suppressor variables play in a multiple regression model and provides practical examples that explain how they can change research results. A variable that when added as another predictor increases the total…
Curran, Janet H.; Barth, Nancy A.; Veilleux, Andrea G.; Ourso, Robert T.
2016-03-16
Estimates of the magnitude and frequency of floods are needed across Alaska for engineering design of transportation and water-conveyance structures, flood-insurance studies, flood-plain management, and other water-resource purposes. This report updates methods for estimating flood magnitude and frequency in Alaska and conterminous basins in Canada. Annual peak-flow data through water year 2012 were compiled from 387 streamgages on unregulated streams with at least 10 years of record. Flood-frequency estimates were computed for each streamgage using the Expected Moments Algorithm to fit a Pearson Type III distribution to the logarithms of annual peak flows. A multiple Grubbs-Beck test was used to identify potentially influential low floods in the time series of peak flows for censoring in the flood frequency analysis.For two new regional skew areas, flood-frequency estimates using station skew were computed for stations with at least 25 years of record for use in a Bayesian least-squares regression analysis to determine a regional skew value. The consideration of basin characteristics as explanatory variables for regional skew resulted in improvements in precision too small to warrant the additional model complexity, and a constant model was adopted. Regional Skew Area 1 in eastern-central Alaska had a regional skew of 0.54 and an average variance of prediction of 0.45, corresponding to an effective record length of 22 years. Regional Skew Area 2, encompassing coastal areas bordering the Gulf of Alaska, had a regional skew of 0.18 and an average variance of prediction of 0.12, corresponding to an effective record length of 59 years. Station flood-frequency estimates for study sites in regional skew areas were then recomputed using a weighted skew incorporating the station skew and regional skew. In a new regional skew exclusion area outside the regional skew areas, the density of long-record streamgages was too sparse for regional analysis and station skew was used for all estimates. Final station flood frequency estimates for all study streamgages are presented for the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities.Regional multiple-regression analysis was used to produce equations for estimating flood frequency statistics from explanatory basin characteristics. Basin characteristics, including physical and climatic variables, were updated for all study streamgages using a geographical information system and geospatial source data. Screening for similar-sized nested basins eliminated hydrologically redundant sites, and screening for eligibility for analysis of explanatory variables eliminated regulated peaks, outburst peaks, and sites with indeterminate basin characteristics. An ordinary least‑squares regression used flood-frequency statistics and basin characteristics for 341 streamgages (284 in Alaska and 57 in Canada) to determine the most suitable combination of basin characteristics for a flood-frequency regression model and to explore regional grouping of streamgages for explaining variability in flood-frequency statistics across the study area. The most suitable model for explaining flood frequency used drainage area and mean annual precipitation as explanatory variables for the entire study area as a region. Final regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability discharge in Alaska and conterminous basins in Canada were developed using a generalized least-squares regression. The average standard error of prediction for the regression equations for the various annual exceedance probabilities ranged from 69 to 82 percent, and the pseudo-coefficient of determination (pseudo-R2) ranged from 85 to 91 percent.The regional regression equations from this study were incorporated into the U.S. Geological Survey StreamStats program for a limited area of the State—the Cook Inlet Basin. StreamStats is a national web-based geographic information system application that facilitates retrieval of streamflow statistics and associated information. StreamStats retrieves published data for gaged sites and, for user-selected ungaged sites, delineates drainage areas from topographic and hydrographic data, computes basin characteristics, and computes flood frequency estimates using the regional regression equations.
Rahman, Abdul; Perri, Andrea; Deegan, Avril; Kuntz, Jennifer; Cawthorpe, David
2018-01-01
Context There is a movement toward trauma-informed, trauma-focused psychiatric treatment. Objective To examine Adverse Childhood Experiences (ACE) survey items by sex and by total scores by sex vs clinical measures of impairment to examine the clinical utility of the ACE survey as an index of trauma in a child and adolescent mental health care setting. Design Descriptive, polychoric factor analysis and regression analyses were employed to analyze cross-sectional ACE surveys (N = 2833) and registration-linked data using past admissions (N = 10,400) collected from November 2016 to March 2017 related to clinical data (28 independent variables), taking into account multicollinearity. Results Distinct ACE items emerged for males, females, and those with self-identified sex and for ACE total scores in regression analysis. In hierarchical regression analysis, the final models consisting of standard clinical measures and demographic and system variables (eg, repeated admissions) were associated with substantial ACE total score variance for females (44%) and males (38%). Inadequate sample size foreclosed on developing a reduced multivariable model for the self-identified sex group. Conclusion The ACE scores relate to independent clinical measures and system and demographic variables. There are implications for clinical practice. For example, a child presenting with anxiety and a high ACE score likely requires treatment that is different from a child presenting with anxiety and an ACE score of zero. The ACE survey score is an important index of presenting clinical status that guides patient care planning and intervention in the progress toward a trauma-focused system of care. PMID:29401055
A Meta-analysis on Resting State High-frequency Heart Rate Variability in Bulimia Nervosa.
Peschel, Stephanie K V; Feeling, Nicole R; Vögele, Claus; Kaess, Michael; Thayer, Julian F; Koenig, Julian
2016-09-01
Autonomic nervous system function is altered in eating disorders. We aimed to quantify differences in resting state vagal activity, indexed by high-frequency heart rate variability comparing patients with bulimia nervosa (BN) and healthy controls. A systematic search of the literature to identify studies eligible for inclusion and meta-analytical methods were applied. Meta-regression was used to identify potential covariates. Eight studies reporting measures of resting high-frequency heart rate variability in individuals with BN (n = 137) and controls (n = 190) were included. Random-effects meta-analysis revealed a sizeable main effect (Z = 2.22, p = .03; Hedge's g = 0.52, 95% CI [0.06;0.98]) indicating higher resting state vagal activity in individuals with BN. Meta-regression showed that body mass index and medication intake are significant covariates. Findings suggest higher vagal activity in BN at rest, particularly in unmedicated samples with lower body mass index. Potential mechanisms underlying these findings and implications for routine clinical care are discussed. Copyright © 2016 John Wiley & Sons, Ltd and Eating Disorders Association. Copyright © 2016 John Wiley & Sons, Ltd and Eating Disorders Association.
Factors associated with vocal fold pathologies in teachers.
Souza, Carla Lima de; Carvalho, Fernando Martins; Araújo, Tânia Maria de; Reis, Eduardo José Farias Borges Dos; Lima, Verônica Maria Cadena; Porto, Lauro Antonio
2011-10-01
To analyze factors associated with the prevalence of the medical diagnosis of vocal fold pathologies in teachers. A census-based epidemiological, cross-sectional study was conducted with 4,495 public primary and secondary school teachers in the city of Salvador, Northeastern Brazil, between March and April 2006. The dependent variable was the self-reported medical diagnosis of vocal fold pathologies and the independent variables were sociodemographic characteristics; professional activity; work organization/interpersonal relationships; physical work environment characteristics; frequency of common mental disorders, measured by the Self-Reporting Questionnaire-20 (SRQ-20 >7); and general health conditions. Descriptive statistical, bivariate and multiple logistic regression analysis techniques were used. The prevalence of self-reported medical diagnosis of vocal fold pathologies was 18.9%. In the logistic regression analysis, the variables that remained associated with this medical diagnosis were as follows: being female, having worked as a teacher for more than seven years, excessive voice use, reporting more than five unfavorable physical work environment characteristics and presence of common mental disorders. The presence of self-reported vocal fold pathologies was associated with factors that point out the need of actions that promote teachers' vocal health and changes in their work structure and organization.
Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K
2017-01-01
The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.
Frank W. Davis; Mark Borchert; L. E. Harvey; Joel C. Michaelsen
1991-01-01
Blue oak seedling mortality was studied in relation to vertebrate predators, initial acorn planting position, slope and aspect, and oak canopy cover at two sites in the Central Coast Ranges of California. Seedling survival rates (Psd) were related to treatment variables using logistic regression analysis. Analysis of 2842 seedlings for 3 years following establishment...
ERIC Educational Resources Information Center
Fernandez-Sainz, A.; García-Merino, J. D.; Urionabarrenetxea, S.
2016-01-01
This paper seeks to discover whether the performance of university students has improved in the wake of the changes in higher education introduced by the Bologna Declaration of 1999 and the construction of the European Higher Education Area. A principal component analysis is used to construct a multi-dimensional performance variable called the…
Women's perceptions of their male batterers' characteristics and level of violence.
Torres, Sara; Han, Hae-Ra
2003-01-01
This article describes the characteristics of male perpetrators of domestic violence and their relationship to the level of violence. The data about the male partners obtained from 151 battered women were used for this analysis. Using multiple regression, demographic variables and three behavioral indicators, including use of alcohol before a violent episode, history of arrests, and the generality of violence, were examined together for their relationship with the violence scores. With the level of violence as measured by the Conflict Tactics Scale (CTS) as the dependent variable, demographic variables explained 19.1% of the variability, with the behavioral indicators accounting for an additional 4.6% of the variability. Several research and clinical implications are addressed.
NASA Astrophysics Data System (ADS)
Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong
2018-01-01
Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
REJEKI, Dwi Sarwani Sri; NURHAYATI, Nunung; AJI, Budi; MURHANDARWATI, E. Elsa Herdiana; KUSNANTO, Hari
2018-01-01
Background: Climatic and weather factors become important determinants of vector-borne diseases transmission like malaria. This study aimed to prove relationships between weather factors with considering human migration and previous case findings and malaria cases in endemic areas in Purworejo during 2005–2014. Methods: This study employed ecological time series analysis by using monthly data. The independent variables were the maximum temperature, minimum temperature, maximum humidity, minimum humidity, precipitation, human migration, and previous malaria cases, while the dependent variable was positive malaria cases. Three models of count data regression analysis i.e. Poisson model, quasi-Poisson model, and negative binomial model were applied to measure the relationship. The least Akaike Information Criteria (AIC) value was also performed to find the best model. Negative binomial regression analysis was considered as the best model. Results: The model showed that humidity (lag 2), precipitation (lag 3), precipitation (lag 12), migration (lag1) and previous malaria cases (lag 12) had a significant relationship with malaria cases. Conclusion: Weather, migration and previous malaria cases factors need to be considered as prominent indicators for the increase of malaria case projection. PMID:29900134
Wiley, J.B.; Atkins, John T.; Tasker, Gary D.
2000-01-01
Multiple and simple least-squares regression models for the log10-transformed 100-year discharge with independent variables describing the basin characteristics (log10-transformed and untransformed) for 267 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions of the State, designated East, North, and South. Exploratory data analysis procedures identified 31 gaging stations at which discharges are different than would be expected for West Virginia. Regional equations for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year peak discharges were determined by generalized least-squares regression using data from 236 gaging stations. Log10-transformed drainage area was the most significant independent variable for all regions.Equations developed in this study are applicable only to rural, unregulated, streams within the boundaries of West Virginia. The accuracy of estimating equations is quantified by measuring the average prediction error (from 27.7 to 44.7 percent) and equivalent years of record (from 1.6 to 20.0 years).
Ordinal logistic regression analysis on the nutritional status of children in KarangKitri village
NASA Astrophysics Data System (ADS)
Ohyver, Margaretha; Yongharto, Kimmy Octavian
2015-09-01
Ordinal logistic regression is a statistical technique that can be used to describe the relationship between ordinal response variable with one or more independent variables. This method has been used in various fields including in the health field. In this research, ordinal logistic regression is used to describe the relationship between nutritional status of children with age, gender, height, and family status. Nutritional status of children in this research is divided into over nutrition, well nutrition, less nutrition, and malnutrition. The purpose for this research is to describe the characteristics of children in the KarangKitri Village and to determine the factors that influence the nutritional status of children in the KarangKitri village. There are three things that obtained from this research. First, there are still children who are not categorized as well nutritional status. Second, there are children who come from sufficient economic level which include in not normal status. Third, the factors that affect the nutritional level of children are age, family status, and height.
Age Estimation of Infants Through Metric Analysis of Developing Anterior Deciduous Teeth.
Viciano, Joan; De Luca, Stefano; Irurita, Javier; Alemán, Inmaculada
2018-01-01
This study provides regression equations for estimation of age of infants from the dimensions of their developing deciduous teeth. The sample comprises 97 individuals of known sex and age (62 boys, 35 girls), aged between 2 days and 1,081 days. The age-estimation equations were obtained for the sexes combined, as well as for each sex separately, thus including "sex" as an independent variable. The values of the correlations and determination coefficients obtained for each regression equation indicate good fits for most of the equations obtained. The "sex" factor was statistically significant when included as an independent variable in seven of the regression equations. However, the "sex" factor provided an advantage for age estimation in only three of the equations, compared to those that did not include "sex" as a factor. These data suggest that the ages of infants can be accurately estimated from measurements of their developing deciduous teeth. © 2017 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
MULTIVARIATE ANALYSIS OF DRINKING BEHAVIOUR IN A RURAL POPULATION
Mathrubootham, N.; Bashyam, V.S.P.; Shahjahan
1997-01-01
This study was carried out to find out the drinking pattern in a rural population, using multivariate techniques. 386 current users identified in a community were assessed with regard to their drinking behaviours using a structured interview. For purposes of the study the questions were condensed into 46 meaningful variables. In bivariate analysis, 14 variables including dependent variables such as dependence, MAST & CAGE (measuring alcoholic status), Q.F. Index and troubled drinking were found to be significant. Taking these variables and other multivariate techniques too such as ANOVA, correlation, regression analysis and factor analysis were done using both SPSS PC + and HCL magnum mainframe computer with FOCUS package and UNIX systems. Results revealed that number of factors such as drinking style, duration of drinking, pattern of abuse, Q.F. Index and various problems influenced drinking and some of them set up a vicious circle. Factor analysis revealed mainly 3 factors, abuse, dependence and social drinking factors. Dependence could be divided into low/moderate dependence. The implications and practical applications of these tests are also discussed. PMID:21584077
The 2011 heat wave in Greater Houston: Effects of land use on temperature.
Zhou, Weihe; Ji, Shuang; Chen, Tsun-Hsuan; Hou, Yi; Zhang, Kai
2014-11-01
Effects of land use on temperatures during severe heat waves have been rarely studied. This paper examines land use-temperature associations during the 2011 heat wave in Greater Houston. We obtained high resolution of satellite-derived land use data from the US National Land Cover Database, and temperature observations at 138 weather stations from Weather Underground, Inc (WU) during the August of 2011, which was the hottest month in Houston since 1889. Land use regression and quantile regression methods were applied to the monthly averages of daily maximum/mean/minimum temperatures and 114 land use-related predictors. Although selected variables vary with temperature metric, distance to the coastline consistently appears among all models. Other variables are generally related to high developed intensity, open water or wetlands. In addition, our quantile regression analysis shows that distance to the coastline and high developed intensity areas have larger impacts on daily average temperatures at higher quantiles, and open water area has greater impacts on daily minimum temperatures at lower quantiles. By utilizing both land use regression and quantile regression on a recent heat wave in one of the largest US metropolitan areas, this paper provides a new perspective on the impacts of land use on temperatures. Our models can provide estimates of heat exposures for epidemiological studies, and our findings can be combined with demographic variables, air conditioning and relevant diseases information to identify 'hot spots' of population vulnerability for public health interventions to reduce heat-related health effects during heat waves. Copyright © 2014 Elsevier Inc. All rights reserved.
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Yangho; Lee, Byung-Kook, E-mail: bklee@sch.ac.kr
Introduction: The objective of this study was to evaluate associations between blood lead, cadmium, and mercury levels with estimated glomerular filtration rate in a general population of South Korean adults. Methods: This was a cross-sectional study based on data obtained in the Korean National Health and Nutrition Examination Survey (KNHANES) (2008-2010). The final analytical sample consisted of 5924 participants. Estimated glomerular filtration rate (eGFR) was calculated using the MDRD Study equation as an indicator of glomerular function. Results: In multiple linear regression analysis of log2-transformed blood lead as a continuous variable on eGFR, after adjusting for covariates including cadmium andmore » mercury, the difference in eGFR levels associated with doubling of blood lead were -2.624 mL/min per 1.73 m Superscript-Two (95% CI: -3.803 to -1.445). In multiple linear regression analysis using quartiles of blood lead as the independent variable, the difference in eGFR levels comparing participants in the highest versus the lowest quartiles of blood lead was -3.835 mL/min per 1.73 m Superscript-Two (95% CI: -5.730 to -1.939). In a multiple linear regression analysis using blood cadmium and mercury, as continuous or categorical variables, as independent variables, neither metal was a significant predictor of eGFR. Odds ratios (ORs) and 95% CI values for reduced eGFR calculated for log2-transformed blood metals and quartiles of the three metals showed similar trends after adjustment for covariates. Discussion: In this large, representative sample of South Korean adults, elevated blood lead level was consistently associated with lower eGFR levels and with the prevalence of reduced eGFR even in blood lead levels below 10 {mu}g/dL. In conclusion, elevated blood lead level was associated with lower eGFR in a Korean general population, supporting the role of lead as a risk factor for chronic kidney disease.« less
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Regression and Sentinel Lymph Node Status in Melanoma Progression
Letca, Alina Florentina; Ungureanu, Loredana; Şenilă, Simona Corina; Grigore, Lavinia Elena; Pop, Ştefan; Fechete, Oana; Vesa, Ştefan Cristian
2018-01-01
Background The purpose of this study was to assess the role of regression and other clinical and histological features for the prognosis and the progression of cutaneous melanoma. Material/Methods Between 2005 and 2016, 403 patients with melanoma were treated and followed at our Department of Dermatology. Of the 403 patients, 173 patients had cutaneous melanoma and underwent sentinel lymph node (SLN) biopsy and thus were included in this study. Results Histological regression was found in 37 cases of melanoma (21.3%). It was significantly associated with marked and moderate tumor-infiltrating lymphocyte (TIL) and with negative SLN. Progression of the disease occurred in 42 patients (24.2%). On multivariate analysis, we found that a positive lymph node and a Breslow index higher than 2 mm were independent variables associated with disease free survival (DFS). These variables together with a mild TIL were significantly correlated with overall survival (OS). The presence of regression was not associated with DFS or OS. Conclusions We could not demonstrate an association between regression and the outcome of patients with cutaneous melanoma. Tumor thickness greater than 2 mm and a positive SLN were associated with recurrence. Survival was influenced by a Breslow thickness >2 mm, the presence of a mild TIL and a positive SLN status. PMID:29507279