Sample records for obtained multiple regression

  1. Multiple-Instance Regression with Structured Data

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

    2008-01-01

    We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

  2. A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

    PubMed

    Bersabé, Rosa; Rivas, Teresa

    2010-05-01

    The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

  3. Multiple Correlation versus Multiple Regression.

    ERIC Educational Resources Information Center

    Huberty, Carl J.

    2003-01-01

    Describes differences between multiple correlation analysis (MCA) and multiple regression analysis (MRA), showing how these approaches involve different research questions and study designs, different inferential approaches, different analysis strategies, and different reported information. (SLD)

  4. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  5. Multiple imputation for cure rate quantile regression with censored data.

    PubMed

    Wu, Yuanshan; Yin, Guosheng

    2017-03-01

    The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.

  6. Incremental Net Effects in Multiple Regression

    ERIC Educational Resources Information Center

    Lipovetsky, Stan; Conklin, Michael

    2005-01-01

    A regular problem in regression analysis is estimating the comparative importance of the predictors in the model. This work considers the 'net effects', or shares of the predictors in the coefficient of the multiple determination, which is a widely used characteristic of the quality of a regression model. Estimation of the net effects can be a…

  7. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  8. Multiple Regression: A Leisurely Primer.

    ERIC Educational Resources Information Center

    Daniel, Larry G.; Onwuegbuzie, Anthony J.

    Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…

  9. RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

    DTIC Science & Technology

    This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)

  10. Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources

    PubMed Central

    O’Brien, Liam M.; Fitzmaurice, Garrett M.

    2006-01-01

    We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry. PMID:15726666

  11. Simple and multiple linear regression: sample size considerations.

    PubMed

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. The Geometry of Enhancement in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.

    2011-01-01

    In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and…

  13. Optimization of fixture layouts of glass laser optics using multiple kernel regression.

    PubMed

    Su, Jianhua; Cao, Enhua; Qiao, Hong

    2014-05-10

    We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.

  14. General Nature of Multicollinearity in Multiple Regression Analysis.

    ERIC Educational Resources Information Center

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  15. Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants

    USDA-ARS?s Scientific Manuscript database

    One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...

  16. Enhance-Synergism and Suppression Effects in Multiple Regression

    ERIC Educational Resources Information Center

    Lipovetsky, Stan; Conklin, W. Michael

    2004-01-01

    Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…

  17. Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.

    ERIC Educational Resources Information Center

    Rowell, R. Kevin

    In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…

  18. Noninvasive spectral imaging of skin chromophores based on multiple regression analysis aided by Monte Carlo simulation

    NASA Astrophysics Data System (ADS)

    Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa

    2011-08-01

    In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.

  19. The M Word: Multicollinearity in Multiple Regression.

    ERIC Educational Resources Information Center

    Morrow-Howell, Nancy

    1994-01-01

    Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…

  20. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES1

    PubMed Central

    Zhu, Xiang; Stephens, Matthew

    2017-01-01

    Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241

  1. Categorical Variables in Multiple Regression: Some Cautions.

    ERIC Educational Resources Information Center

    O'Grady, Kevin E.; Medoff, Deborah R.

    1988-01-01

    Limitations of dummy coding and nonsense coding as methods of coding categorical variables for use as predictors in multiple regression analysis are discussed. The combination of these approaches often yields estimates and tests of significance that are not intended by researchers for inclusion in their models. (SLD)

  2. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

    PubMed

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-06-15

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.

  3. Floating Data and the Problem with Illustrating Multiple Regression.

    ERIC Educational Resources Information Center

    Sachau, Daniel A.

    2000-01-01

    Discusses how to introduce basic concepts of multiple regression by creating a large-scale, three-dimensional regression model using the classroom walls and floor. Addresses teaching points that should be covered and reveals student reaction to the model. Finds that the greatest benefit of the model is the low fear, walk-through, nonmathematical…

  4. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  5. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  6. Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity

    PubMed Central

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655

  7. Tools to support interpreting multiple regression in the face of multicollinearity.

    PubMed

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  8. Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan T.

    2012-01-01

    Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…

  9. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  10. A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

    PubMed

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

    2006-08-01

    A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.

  11. Statistical research using the multiple regression analysis in areas of the cast hipereutectoid steel rolls manufacturing

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Alexa, V.; Serban, S.; Rackov, M.; Čavić, M.

    2018-01-01

    The cast hipereutectoid steel (usually named Adamite) is a roll manufacturing destined material, having mechanical, chemical properties and Carbon [C] content of which stands between steelandiron, along-withitsalloyelements such as Nickel [Ni], Chrome [Cr], Molybdenum [Mo] and/or other alloy elements. Adamite Rolls are basically alloy steel rolls (a kind of high carbon steel) having hardness ranging from 40 to 55 degrees Shore C, with Carbon [C] percentage ranging from 1.35% until to 2% (usually between 1.2˜2.3%), the extra Carbon [C] and the special alloying element giving an extra wear resistance and strength. First of all the Adamite roll’s prominent feature is the small variation in hardness of the working surface, and has a good abrasion resistance and bite performance. This paper reviews key aspects of roll material properties and presents an analysis of the influences of chemical composition upon the mechanical properties (hardness) of the cast hipereutectoid steel rolls (Adamite). Using the multiple regression analysis (the double and triple regression equations), some mathematical correlations between the cast hipereutectoid steel rolls’ chemical composition and the obtained hardness are presented. In this work several results and evidence obtained by actual experiments are presented. Thus, several variation boundaries for the chemical composition of cast hipereutectoid steel rolls, in view the obtaining the proper values of the hardness, are revealed. For the multiple regression equations, correlation coefficients and graphical representations the software Matlab was used.

  12. The Detection and Interpretation of Interaction Effects between Continuous Variables in Multiple Regression.

    ERIC Educational Resources Information Center

    Jaccard, James; And Others

    1990-01-01

    Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)

  13. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    PubMed

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  14. Reduction of interferences in graphite furnace atomic absorption spectrometry by multiple linear regression modelling

    NASA Astrophysics Data System (ADS)

    Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto

    2000-12-01

    The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.

  15. Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results

    ERIC Educational Resources Information Center

    Warne, Russell T.

    2011-01-01

    Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…

  16. Simultaneous multiple non-crossing quantile regression estimation using kernel constraints

    PubMed Central

    Liu, Yufeng; Wu, Yichao

    2011-01-01

    Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842

  17. Multiple regression for physiological data analysis: the problem of multicollinearity.

    PubMed

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  18. An improved multiple linear regression and data analysis computer program package

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  19. A Method of Calculating Functional Independence Measure at Discharge from Functional Independence Measure Effectiveness Predicted by Multiple Regression Analysis Has a High Degree of Predictive Accuracy.

    PubMed

    Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru

    2017-09-01

    Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.

  20. Predicting MHC-II binding affinity using multiple instance regression

    PubMed Central

    EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2011-01-01

    Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding pathogenesis and immune response. The task of predicting MHC-II binding peptides is complicated by the significant variability in their length. Most existing computational methods for predicting MHC-II binding peptides focus on identifying a nine amino acids core region in each binding peptide. We formulate the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We present results of experiments using several benchmark datasets that show that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides. An online web server that implements the MHCMIR method for MHC-II binding affinity prediction is freely accessible at http://ailab.cs.iastate.edu/mhcmir. PMID:20855923

  1. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  2. Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution.

    ERIC Educational Resources Information Center

    Kromrey, Jeffrey D.; Hines, Constance V.

    1995-01-01

    The accuracy of four empirical techniques to estimate shrinkage in multiple regression was studied through Monte Carlo simulation. None of the techniques provided unbiased estimates of the population squared multiple correlation coefficient, but the normalized jackknife and bootstrap techniques demonstrated marginally acceptable performance with…

  3. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    PubMed Central

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  4. MULGRES: a computer program for stepwise multiple regression analysis

    Treesearch

    A. Jeff Martin

    1971-01-01

    MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.

  5. Estimating air drying times of lumber with multiple regression

    Treesearch

    William T. Simpson

    2004-01-01

    In this study, the applicability of a multiple regression equation for estimating air drying times of red oak, sugar maple, and ponderosa pine lumber was evaluated. The equation allows prediction of estimated air drying times from historic weather records of temperature and relative humidity at any desired location.

  6. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    ERIC Educational Resources Information Center

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  7. Neural network and multiple linear regression to predict school children dimensions for ergonomic school furniture design.

    PubMed

    Agha, Salah R; Alnahhal, Mohammed J

    2012-11-01

    The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  8. Analysis and Interpretation of Findings Using Multiple Regression Techniques

    ERIC Educational Resources Information Center

    Hoyt, William T.; Leierer, Stephen; Millington, Michael J.

    2006-01-01

    Multiple regression and correlation (MRC) methods form a flexible family of statistical techniques that can address a wide variety of different types of research questions of interest to rehabilitation professionals. In this article, we review basic concepts and terms, with an emphasis on interpretation of findings relevant to research questions…

  9. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    NASA Astrophysics Data System (ADS)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  10. False Positives in Multiple Regression: Unanticipated Consequences of Measurement Error in the Predictor Variables

    ERIC Educational Resources Information Center

    Shear, Benjamin R.; Zumbo, Bruno D.

    2013-01-01

    Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…

  11. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    PubMed

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  12. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  13. Some Applied Research Concerns Using Multiple Linear Regression Analysis.

    ERIC Educational Resources Information Center

    Newman, Isadore; Fraas, John W.

    The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…

  14. Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

    PubMed

    Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

    2014-12-01

    This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.

  15. Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.

    PubMed

    Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A

    2017-02-01

    In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).

  16. A Solution to Separation and Multicollinearity in Multiple Logistic Regression

    PubMed Central

    Shen, Jianzhao; Gao, Sujuan

    2010-01-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286

  17. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    PubMed

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  18. Two SPSS programs for interpreting multiple regression results.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  19. Predictors of postoperative outcomes of cubital tunnel syndrome treatments using multiple logistic regression analysis.

    PubMed

    Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki

    2017-05-01

    This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  20. Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.

    NASA Technical Reports Server (NTRS)

    Ohring, G.

    1972-01-01

    Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.

  1. Interactions between cadmium and decabrominated diphenyl ether on blood cells count in rats-Multiple factorial regression analysis.

    PubMed

    Curcic, Marijana; Buha, Aleksandra; Stankovic, Sanja; Milovanovic, Vesna; Bulat, Zorica; Đukić-Ćosić, Danijela; Antonijević, Evica; Vučinić, Slavica; Matović, Vesna; Antonijevic, Biljana

    2017-02-01

    The objective of this study was to assess toxicity of Cd and BDE-209 mixture on haematological parameters in subacutely exposed rats and to determine the presence and type of interactions between these two chemicals using multiple factorial regression analysis. Furthermore, for the assessment of interaction type, an isobologram based methodology was applied and compared with multiple factorial regression analysis. Chemicals were given by oral gavage to the male Wistar rats weighing 200-240g for 28days. Animals were divided in 16 groups (8/group): control vehiculum group, three groups of rats were treated with 2.5, 7.5 or 15mg Cd/kg/day. These doses were chosen on the bases of literature data and reflect relatively high Cd environmental exposure, three groups of rats were treated with 1000, 2000 or 4000mg BDE-209/kg/bw/day, doses proved to induce toxic effects in rats. Furthermore, nine groups of animals were treated with different mixtures of Cd and BDE-209 containing doses of Cd and BDE-209 stated above. Blood samples were taken at the end of experiment and red blood cells, white blood cells and platelets counts were determined. For interaction assessment multiple factorial regression analysis and fitted isobologram approach were used. In this study, we focused on multiple factorial regression analysis as a method for interaction assessment. We also investigated the interactions between Cd and BDE-209 by the derived model for the description of the obtained fitted isobologram curves. Current study indicated that co-exposure to Cd and BDE-209 can result in significant decrease in RBC count, increase in WBC count and decrease in PLT count, when compared with controls. Multiple factorial regression analysis used for the assessment of interactions type between Cd and BDE-209 indicated synergism for the effect on RBC count and no interactions i.e. additivity for the effects on WBC and PLT counts. On the other hand, isobologram based approach showed slight antagonism

  2. Variables Associated with Communicative Participation in People with Multiple Sclerosis: A Regression Analysis

    ERIC Educational Resources Information Center

    Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar

    2010-01-01

    Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…

  3. Cross Validation of Selection of Variables in Multiple Regression.

    DTIC Science & Technology

    1979-12-01

    55 vii CROSS VALIDATION OF SELECTION OF VARIABLES IN MULTIPLE REGRESSION I Introduction Background Long term DoD planning gcals...028545024 .31109000 BF * SS - .008700618 .0471961 Constant - .70977903 85.146786 55 had adequate predictive capabilities; the other two models (the...71ZCO F111D Control 54 73EGO FlIID Computer, General Purpose 55 73EPO FII1D Converter-Multiplexer 56 73HAO flllD Stabilizer Platform 57 73HCO F1ID

  4. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis

    PubMed Central

    Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma

    2016-01-01

    Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666

  5. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  6. Estimating Optimal Transformations for Multiple Regression and Correlation.

    DTIC Science & Technology

    1982-07-01

    S w.EECTli1Z"", , J OCT 0 11982 u! !for Public its... .. . ESTIMATING OPTIMAL TRANSFORMATIONS FOR MULTIPLE REGRESSION AND CORRELATION by Leo...in the plot lb of *(yk) versus 1 < k < 200. Figure lc is a plot of $*(xk) versus xk. These plots clearly suggest the transformati " s 6(y) = log(y) and...direct .814 .022 ACE .808 .031 -13- Figure la6L ’ ’ I . . . S " ’ ’ . . I ’ 6- - - .4...... Co o • . o ’ 0 0.2 0.4 0.5 0.8 1 Fi gure lb2 2 2 // II / / -/

  7. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  8. Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng

    2011-11-01

    SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.

  9. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR

  10. Use of Multiple Regression and Use-Availability Analyses in Determining Habitat Selection by Gray Squirrels (Sciurus Carolinensis)

    Treesearch

    John W. Edwards; Susan C. Loeb; David C. Guynn

    1994-01-01

    Multiple regression and use-availability analyses are two methods for examining habitat selection. Use-availability analysis is commonly used to evaluate macrohabitat selection whereas multiple regression analysis can be used to determine microhabitat selection. We compared these techniques using behavioral observations (n = 5534) and telemetry locations (n = 2089) of...

  11. Understanding logistic regression analysis.

    PubMed

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  12. Parametric optimization of multiple quality characteristics in laser cutting of Inconel-718 by using hybrid approach of multiple regression analysis and genetic algorithm

    NASA Astrophysics Data System (ADS)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-06-01

    Inconel-718 has found high demand in different industries due to their superior mechanical properties. The traditional cutting methods are facing difficulties for cutting these alloys due to their low thermal potential, lower elasticity and high chemical compatibility at inflated temperature. The challenges of machining and/or finishing of unusual shapes and/or sizes in these materials have also faced by traditional machining. Laser beam cutting may be applied for the miniaturization and ultra-precision cutting and/or finishing by appropriate control of different process parameter. This paper present multi-objective optimization the kerf deviation, kerf width and kerf taper in the laser cutting of Incone-718 sheet. The second order regression models have been developed for different quality characteristics by using the experimental data obtained through experimentation. The regression models have been used as objective function for multi-objective optimization based on the hybrid approach of multiple regression analysis and genetic algorithm. The comparison of optimization results to experimental results shows an improvement of 88%, 10.63% and 42.15% in kerf deviation, kerf width and kerf taper, respectively. Finally, the effects of different process parameters on quality characteristics have also been discussed.

  13. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  14. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    PubMed

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  15. Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan

    2013-01-01

    The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…

  16. [Difference between perinatal mortality in multiple pregnancies obtained spontaneously versus assisted reproduction].

    PubMed

    del Rayo Rivas-Ortiz, Yazmín; Hernández-Herrera, Ricardo Jorge

    2010-06-01

    Recently assisted reproduction techniques are more common, which increases multiple pregnancies and adverse perinatal outcomes. Some authors report increased mortality in multiple pregnancies products obtained by techniques of assisted reproduction vs. conceived spontaneously, although other authors found no significant difference. To evaluate mortality rate of multiple pregnancies comparing those obtained by assisted reproduction vs. spontaneous conception. Retrospective, observational and comparative study. We included pregnant women with 3 or more products that went to the Unidad Médica de Alta Especialidad No. 23, IMSS, in Monterrey, NL (Mexico), between 2002-2008. We compared the number of complicated pregnancies and dead products obtained by a technique of assisted reproduction vs. spontaneous. 68 multiple pregnancies were included. On average, spontaneously conceived fetuses had more weeks of gestation and more birth weight than those achieved by assisted reproduction techniques (p = ns). 20.5% (14/68) of multiple pregnancies had one or more fatal events: 10/40 (25%) by assisted reproduction techniques vs. 4/28 (14%) of spontaneous multiple pregnancies (p = 0.22). 21/134 (16%) of the products conceived by assisted reproduction techniques and 6/88 (7%) of spontaneous (p < 0.03) died. 60% of all multiple pregnancies were obtained by a technique of assisted reproduction and 21% of the cases had one or more fatal events (11% more in pregnancies achieved by assisted reproduction techniques). 12% of the products of multiple pregnancies died (9% more in those obtained by a technique of assisted reproduction).

  17. Estimation of nutrients and organic matter in Korean swine slurry using multiple regression analysis of physical and chemical properties.

    PubMed

    Suresh, Arumuganainar; Choi, Hong Lim

    2011-10-01

    Swine waste land application has increased due to organic fertilization, but excess application in an arable system can cause environmental risk. Therefore, in situ characterizations of such resources are important prior to application. To explore this, 41 swine slurry samples were collected from Korea, and wide differences were observed in the physico-biochemical properties. However, significant (P<0.001) multiple property correlations (R²) were obtained between nutrients with specific gravity (SG), electrical conductivity (EC), total solids (TS) and pH. The different combinations of hydrometer, EC meter, drying oven and pH meter were found useful to estimate Mn, Fe, Ca, K, Al, Na, N and 5-day biochemical oxygen demands (BOD₅) at improved R² values of 0.83, 0.82, 0.77, 0.75, 0.67, 0.47, 0.88 and 0.70, respectively. The results from this study suggest that multiple property regressions can facilitate the prediction of micronutrients and organic matter much better than a single property regression for livestock waste. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. Development of multiple regression analysis instruments to predict success in advanced placement chemistry

    NASA Astrophysics Data System (ADS)

    Wagner, Kurt Collins

    2001-10-01

    This research asks the fundamental question: "What is the profile of the successful AP chemistry student?" Two populations of students are studied. The first population is comprised of students who attend or attended the South Carolina Governor's School for Science and Mathematics, a specialized high school for high ability students, and who have taken the Advanced Placement (AP) chemistry examination in the past five years. The second population is comprised of the 581 South Carolina public school students at 46 high schools who took the AP chemistry examination in 2000. The first part of the study is intended to be useful in recruitment and placement decisions for schools in the National Consortium for Specialized Secondary Schools of Mathematics, Science and Technology. The second part of the study is intended to facilitate AP chemistry recruitment in South Carolina public schools. The first part of the study was conducted by ex post facto searches of teacher and school records at the South Carolina Governor's School for Science and Mathematics. The second part of the study was conducted by obtaining school participation information from the SC Department of Education and soliciting data from the public schools. Data were collected from 440 of 581 (75.7%) of students in 35 of 46 (76.1%) of schools. Intercorrelational and Multiple Regression Analyses (MRA) have yielded different results for these two populations. For the specialized school population, the significant predictors for success in AP chemistry are PSAT Math, placement test, and PSAT Writing. For the population of SC students, significant predictors for success are PSAT Math, count of prior science courses, and PSAT Writing. Multiple Regressions have been successfully developed for the two populations studied. Recommendations for their application are made.

  19. Using regression equations built from summary data in the psychological assessment of the individual case: extension to multiple regression.

    PubMed

    Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J

    2012-12-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.

  20. Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

    PubMed

    Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

    2017-09-01

    To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  1. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    EPA Science Inventory

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  2. An Evaluation of Residual Feed Intake Estimates Obtained with Computer Models Versus Empirical Regression

    USDA-ARS?s Scientific Manuscript database

    Data on individual daily feed intake, bi-weekly BW, and carcass composition were obtained on 1,212 crossbred steers, in Cycle VII of the Germplasm Evaluation Project at the U.S. Meat Animal Research Center. Within animal regressions of cumulative feed intake and BW on linear and quadratic days on fe...

  3. Predicting flight delay based on multiple linear regression

    NASA Astrophysics Data System (ADS)

    Ding, Yi

    2017-08-01

    Delay of flight has been regarded as one of the toughest difficulties in aviation control. How to establish an effective model to handle the delay prediction problem is a significant work. To solve the problem that the flight delay is difficult to predict, this study proposes a method to model the arriving flights and a multiple linear regression algorithm to predict delay, comparing with Naive-Bayes and C4.5 approach. Experiments based on a realistic dataset of domestic airports show that the accuracy of the proposed model approximates 80%, which is further improved than the Naive-Bayes and C4.5 approach approaches. The result testing shows that this method is convenient for calculation, and also can predict the flight delays effectively. It can provide decision basis for airport authorities.

  4. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    PubMed

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    NASA Astrophysics Data System (ADS)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  6. Interpret with caution: multicollinearity in multiple regression of cognitive data.

    PubMed

    Morrison, Catriona M

    2003-08-01

    Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.

  7. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  8. Comparing the index-flood and multiple-regression methods using L-moments

    NASA Astrophysics Data System (ADS)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin

  9. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    ERIC Educational Resources Information Center

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  10. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    PubMed

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  11. Suppression Situations in Multiple Linear Regression

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2006-01-01

    This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…

  12. Multiple linear regression analysis

    NASA Technical Reports Server (NTRS)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  13. MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO

    EPA Science Inventory

    A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...

  14. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    PubMed

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  15. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    PubMed

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  16. Assessing the Impact of Influential Observations on Multiple Regression Analysis on Human Resource Research.

    ERIC Educational Resources Information Center

    Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.

    1999-01-01

    A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)

  17. Fast Quantitative Analysis Of Museum Objects Using Laser-Induced Breakdown Spectroscopy And Multiple Regression Algorithms

    NASA Astrophysics Data System (ADS)

    Lorenzetti, G.; Foresta, A.; Palleschi, V.; Legnaioli, S.

    2009-09-01

    The recent development of mobile instrumentation, specifically devoted to in situ analysis and study of museum objects, allows the acquisition of many LIBS spectra in very short time. However, such large amount of data calls for new analytical approaches which would guarantee a prompt analysis of the results obtained. In this communication, we will present and discuss the advantages of statistical analytical methods, such as Partial Least Squares Multiple Regression algorithms vs. the classical calibration curve approach. PLS algorithms allows to obtain in real time the information on the composition of the objects under study; this feature of the method, compared to the traditional off-line analysis of the data, is extremely useful for the optimization of the measurement times and number of points associated with the analysis. In fact, the real time availability of the compositional information gives the possibility of concentrating the attention on the most `interesting' parts of the object, without over-sampling the zones which would not provide useful information for the scholars or the conservators. Some example on the applications of this method will be presented, including the studies recently performed by the researcher of the Applied Laser Spectroscopy Laboratory on museum bronze objects.

  18. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    ERIC Educational Resources Information Center

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  19. Multiple Regression Redshift Calibration for Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Kalinkov, M.; Kuneva, I.; Valtchanov, I.

    A new procedure for calibration of distances to ACO (Abell et al.1989) clusters of galaxies has been developed. In the previous version of the Reference Catalog of ACO Clusters of Galaxies (Kalinkov & Kuneva 1992) an attempt has been made to compare various calibration schemes. For the Version 93 we have made some refinements. Many improvements from the early days of the photometric calibration have been made --- from Rowan-Robinson (1972), Corwin (1974), Kalinkov & Kuneva (1975), Mills Hoskins (1977) to more complicated --- Leir & van den Bergh (1977), Postman et al.(1985), Kalinkov Kuneva (1985, 1986, 1990), Scaramella et al.(1991), Zucca et al. (1993). It was shown that it is impossible to use the same calibration relation for northern (A) and southern (ACO) clusters of galaxies. Therefore the calibration have to be made separately for both catalogs. Moreover it is better if one could find relations for the 274 A-clusters, studied by the authors of ACO. We use the luminosity distance for H0=100km/s/Mpc and q0 = 0.5 and we have 1200 clusters with measured redshifts. The first step is to fit log(z) on m10 (magnitude of the tenth rank galaxy) for A-clusters and on m1, m3 and m10 for ACO clusters. The second step is to take into account the K-correction and the Scott effect (Postman et al.1985) with iterative process. To avoid the initial errors of the redshift estimates in A- and ACO catalogs we adopt Hubble's law for the apparent radial distribution of galaxies in clusters. This enable us to calculate a new cluster richness from preliminary redshift estimate. This is the third step. Further continues the study of the correlation matrix between log(z) and prospective predictors --- new richness groups, BM, RS and A types, radio and X-ray fluxes, apparent separations between the first three brightest galaxies, mean population (gal/sq.deg), Multiple linear as well as nonlinear regression estimators are found. Many clusters that deviate by more than 2.5 sigmas are

  20. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    ERIC Educational Resources Information Center

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  1. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    EPA Science Inventory

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  2. A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity

    ERIC Educational Resources Information Center

    Martin, David

    2008-01-01

    This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…

  3. Toward customer-centric organizational science: A common language effect size indicator for multiple linear regressions and regressions with higher-order terms.

    PubMed

    Krasikova, Dina V; Le, Huy; Bachura, Eric

    2018-06-01

    To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  4. Time-localized wavelet multiple regression and correlation

    NASA Astrophysics Data System (ADS)

    Fernández-Macho, Javier

    2018-02-01

    This paper extends wavelet methodology to handle comovement dynamics of multivariate time series via moving weighted regression on wavelet coefficients. The concept of wavelet local multiple correlation is used to produce one single set of multiscale correlations along time, in contrast with the large number of wavelet correlation maps that need to be compared when using standard pairwise wavelet correlations with rolling windows. Also, the spectral properties of weight functions are investigated and it is argued that some common time windows, such as the usual rectangular rolling window, are not satisfactory on these grounds. The method is illustrated with a multiscale analysis of the comovements of Eurozone stock markets during this century. It is shown how the evolution of the correlation structure in these markets has been far from homogeneous both along time and across timescales featuring an acute divide across timescales at about the quarterly scale. At longer scales, evidence from the long-term correlation structure can be interpreted as stable perfect integration among Euro stock markets. On the other hand, at intramonth and intraweek scales, the short-term correlation structure has been clearly evolving along time, experiencing a sharp increase during financial crises which may be interpreted as evidence of financial 'contagion'.

  5. Overcoming multicollinearity in multiple regression using correlation coefficient

    NASA Astrophysics Data System (ADS)

    Zainodin, H. J.; Yap, S. J.

    2013-09-01

    Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.

  6. Multiple Ordinal Regression by Maximizing the Sum of Margins

    PubMed Central

    Hamsici, Onur C.; Martinez, Aleix M.

    2016-01-01

    Human preferences are usually measured using ordinal variables. A system whose goal is to estimate the preferences of humans and their underlying decision mechanisms requires to learn the ordering of any given sample set. We consider the solution of this ordinal regression problem using a Support Vector Machine algorithm. Specifically, the goal is to learn a set of classifiers with common direction vectors and different biases correctly separating the ordered classes. Current algorithms are either required to solve a quadratic optimization problem, which is computationally expensive, or are based on maximizing the minimum margin (i.e., a fixed margin strategy) between a set of hyperplanes, which biases the solution to the closest margin. Another drawback of these strategies is that they are limited to order the classes using a single ranking variable (e.g., perceived length). In this paper, we define a multiple ordinal regression algorithm based on maximizing the sum of the margins between every consecutive class with respect to one or more rankings (e.g., perceived length and weight). We provide derivations of an efficient, easy-to-implement iterative solution using a Sequential Minimal Optimization procedure. We demonstrate the accuracy of our solutions in several datasets. In addition, we provide a key application of our algorithms in estimating human subjects’ ordinal classification of attribute associations to object categories. We show that these ordinal associations perform better than the binary one typically employed in the literature. PMID:26529784

  7. [Establishment of multiple regression model for virulence factors of Saccharomyces albicans by random amplified polymorphic DNA bands].

    PubMed

    Liu, Qi; Wu, Youcong; Yuan, Youhua; Bai, Li; Niu, Kun

    2011-12-01

    To research the relationship between the virulence factors of Saccharomyces albicans (S. albicans) and the random amplified polymorphic DNA (RAPD) bands of them, and establish the regression model by multiple regression analysis. Extracellular phospholipase, secreted proteinase, ability to generate germ tubes and adhere to oral mucosal cells of 92 strains of S. albicans were measured in vitro; RAPD-polymerase chain reaction (RAPD-PCR) was used to get their bands. Multiple regression for virulence factors of S. albicans and RAPD-PCR bands was established. The extracellular phospholipase activity was associated with 4 RAPD bands: 350, 450, 650 and 1 300 bp (P < 0.05); secreted proteinase activity of S. albicans was associated with 2 bands: 350 and 1 200 bp (P < 0.05); the ability of germ tube produce was associated with 2 bands: 400 and 550 bp (P < 0.05). Some RAPD bands will reflect the virulence factors of S. albicans indirectly. These bands would contain some important messages for regulation of S. albicans virulence factors.

  8. Persistence of Multiple Tumor-Specific T-Cell Clones Is Associated with Complete Tumor Regression in a Melanoma Patient Receiving Adoptive Cell Transfer Therapy

    PubMed Central

    Zhou, Juhua; Dudley, Mark E.; Rosenberg, Steven A.; Robbins, Paul F.

    2007-01-01

    Summary The authors recently reported that adoptive immunotherapy with autologous tumor-reactive tumor infiltrating lymphocytes (TILs) immediately following a conditioning nonmyeloablative chemotherapy regimen resulted in an enhanced clinical response rate in patients with metastatic melanoma. These observations led to the current studies, which are focused on a detailed analysis of the T-cell antigen reactivity as well as the in vivo persistence of T cells in melanoma patient 2098, who experienced a complete regression of all metastatic lesions in lungs and soft tissues following therapy. Screening of an autologous tumor cell cDNA library using transferred TILs resulted in the identification of novel mutated growth arrest-specific gene 7 (GAS7) and glyceral-dehyde-3-phosphate dehydrogenase (GAPDH) gene transcripts. Direct sequence analysis of the expressed T-cell receptor beta chain variable regions showed that the transferred TILs contained multiple T-cell clonotypes, at least six of which persisted in peripheral blood for a month or more following transfer. The persistent T cells recognized both the mutated GAS7 and GAPDH. These persistent tumor-reactive T-cell clones were detected in tumor cell samples obtained from the patient following adoptive cell transfer and appeared to be represented at higher levels in the tumor sample obtained 1 month following transfer than in the peripheral blood obtained at the same time. Overall, these results indicate that multiple tumor-reactive T cells can persist in the peripheral blood and at the tumor site for prolonged times following adoptive transfer and thus may be responsible for the complete tumor regression in this patient. PMID:15614045

  9. Adjusted variable plots for Cox's proportional hazards regression model.

    PubMed

    Hall, C B; Zeger, S L; Bandeen-Roche, K J

    1996-01-01

    Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.

  10. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    PubMed

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

    2016-05-01

    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.

  11. Correlation Weights in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.; Jones, Jeff A.

    2010-01-01

    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  12. Overall Preference of Running Shoes Can Be Predicted by Suitable Perception Factors Using a Multiple Regression Model.

    PubMed

    Tay, Cheryl Sihui; Sterzing, Thorsten; Lim, Chen Yen; Ding, Rui; Kong, Pui Wah

    2017-05-01

    This study examined (a) the strength of four individual footwear perception factors to influence the overall preference of running shoes and (b) whether these perception factors satisfied the nonmulticollinear assumption in a regression model. Running footwear must fulfill multiple functional criteria to satisfy its potential users. Footwear perception factors, such as fit and cushioning, are commonly used to guide shoe design and development, but it is unclear whether running-footwear users are able to differentiate one factor from another. One hundred casual runners assessed four running shoes on a 15-cm visual analogue scale for four footwear perception factors (fit, cushioning, arch support, and stability) as well as for overall preference during a treadmill running protocol. Diagnostic tests showed an absence of multicollinearity between factors, where values for tolerance ranged from .36 to .72, corresponding to variance inflation factors of 2.8 to 1.4. The multiple regression model of these four footwear perception variables accounted for 77.7% to 81.6% of variance in overall preference, with each factor explaining a unique part of the total variance. Casual runners were able to rate each footwear perception factor separately, thus assigning each factor a true potential to improve overall preference for the users. The results also support the use of a multiple regression model of footwear perception factors to predict overall running shoe preference. Regression modeling is a useful tool for running-shoe manufacturers to more precisely evaluate how individual factors contribute to the subjective assessment of running footwear.

  13. Predicting Final GPA of Graduate School Students: Comparing Artificial Neural Networking and Simultaneous Multiple Regression

    ERIC Educational Resources Information Center

    Anderson, Joan L.

    2006-01-01

    Data from graduate student applications at a large Western university were used to determine which factors were the best predictors of success in graduate school, as defined by cumulative graduate grade point average. Two statistical models were employed and compared: artificial neural networking and simultaneous multiple regression. Both models…

  14. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    ERIC Educational Resources Information Center

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  15. Multiple network-constrained regressions expand insights into influenza vaccination responses.

    PubMed

    Avey, Stefan; Mohanty, Subhasis; Wilson, Jean; Zapata, Heidi; Joshi, Samit R; Siconolfi, Barbara; Tsang, Sui; Shaw, Albert C; Kleinstein, Steven H

    2017-07-15

    Systems immunology leverages recent technological advancements that enable broad profiling of the immune system to better understand the response to infection and vaccination, as well as the dysregulation that occurs in disease. An increasingly common approach to gain insights from these large-scale profiling experiments involves the application of statistical learning methods to predict disease states or the immune response to perturbations. However, the goal of many systems studies is not to maximize accuracy, but rather to gain biological insights. The predictors identified using current approaches can be biologically uninterpretable or present only one of many equally predictive models, leading to a narrow understanding of the underlying biology. Here we show that incorporating prior biological knowledge within a logistic modeling framework by using network-level constraints on transcriptional profiling data significantly improves interpretability. Moreover, incorporating different types of biological knowledge produces models that highlight distinct aspects of the underlying biology, while maintaining predictive accuracy. We propose a new framework, Logistic Multiple Network-constrained Regression (LogMiNeR), and apply it to understand the mechanisms underlying differential responses to influenza vaccination. Although standard logistic regression approaches were predictive, they were minimally interpretable. Incorporating prior knowledge using LogMiNeR led to models that were equally predictive yet highly interpretable. In this context, B cell-specific genes and mTOR signaling were associated with an effective vaccination response in young adults. Overall, our results demonstrate a new paradigm for analyzing high-dimensional immune profiling data in which multiple networks encoding prior knowledge are incorporated to improve model interpretability. The R source code described in this article is publicly available at https

  16. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  17. Using Multiple and Logistic Regression to Estimate the Median WillCost and Probability of Cost and Schedule Overrun for Program Managers

    DTIC Science & Technology

    2017-03-23

    PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and

  18. Multiple regression based imputation for individualizing template human model from a small number of measured dimensions.

    PubMed

    Nohara, Ryuki; Endo, Yui; Murai, Akihiko; Takemura, Hiroshi; Kouchi, Makiko; Tada, Mitsunori

    2016-08-01

    Individual human models are usually created by direct 3D scanning or deforming a template model according to the measured dimensions. In this paper, we propose a method to estimate all the necessary dimensions (full set) for the human model individualization from a small number of measured dimensions (subset) and human dimension database. For this purpose, we solved multiple regression equation from the dimension database given full set dimensions as the objective variable and subset dimensions as the explanatory variables. Thus, the full set dimensions are obtained by simply multiplying the subset dimensions to the coefficient matrix of the regression equation. We verified the accuracy of our method by imputing hand, foot, and whole body dimensions from their dimension database. The leave-one-out cross validation is employed in this evaluation. The mean absolute errors (MAE) between the measured and the estimated dimensions computed from 4 dimensions (hand length, breadth, middle finger breadth at proximal, and middle finger depth at proximal) in the hand, 2 dimensions (foot length, breadth, and lateral malleolus height) in the foot, and 1 dimension (height) and weight in the whole body are computed. The average MAE of non-measured dimensions were 4.58% in the hand, 4.42% in the foot, and 3.54% in the whole body, while that of measured dimensions were 0.00%.

  19. A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults

    PubMed Central

    Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni

    2015-01-01

    Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960

  20. A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.

    PubMed

    Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni

    2015-01-01

    Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.

  1. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  2. Multiple regression analysis in nomogram development for myopic wavefront laser in situ keratomileusis: Improving astigmatic outcomes.

    PubMed

    Allan, Bruce D; Hassan, Hala; Ieong, Alvin

    2015-05-01

    To describe and evaluate a new multiple regression-derived nomogram for myopic wavefront laser in situ keratomileusis (LASIK). Moorfields Eye Hospital, London, United Kingdom. Prospective comparative case series. Multiple regression modeling was used to derive a simplified formula for adjusting attempted spherical correction in myopic LASIK. An adaptation of Thibos' power vector method was then applied to derive adjustments to attempted cylindrical correction in eyes with 1.0 diopter (D) or more of preoperative cylinder. These elements were combined in a new nomogram (nomogram II). The 3-month refractive results for myopic wavefront LASIK (spherical equivalent ≤11.0 D; cylinder ≤4.5 D) were compared between 299 consecutive eyes treated using the earlier nomogram (nomogram I) in 2009 and 2010 and 414 eyes treated using nomogram II in 2011 and 2012. There was no significant difference in treatment accuracy (variance in the postoperative manifest refraction spherical equivalent error) between nomogram I and nomogram II (P = .73, Bartlett test). Fewer patients treated with nomogram II had more than 0.5 D of residual postoperative astigmatism (P = .0001, Fisher exact test). There was no significant coupling between adjustments to the attempted cylinder and the achieved sphere (P = .18, t test). Discarding marginal influences from a multiple regression-derived nomogram for myopic wavefront LASIK had no clinically significant effect on treatment accuracy. Thibos' power vector method can be used to guide adjustments to the treatment cylinder alongside nomograms designed to optimize postoperative spherical equivalent results in myopic LASIK. mentioned. Copyright © 2015 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.

  3. Multiple regression technique for Pth degree polynominals with and without linear cross products

    NASA Technical Reports Server (NTRS)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  4. Hierarchical multiple regression modelling on predictors of behavior and sexual practices at Takoradi Polytechnic, Ghana.

    PubMed

    Turkson, Anthony Joe; Otchey, James Eric

    2015-01-14

    Various psychosocial studies on health related lifestyles lay emphasis on the fact that the perception one has of himself as being at risk of HIV/AIDS infection was a necessary condition for preventive behaviors to be adopted. Hierarchical Multiple Regression models was used to examine the relationship between eight independent variables and one dependent variable to isolate predictors which have significant influence on behavior and sexual practices. A Cross-sectional design was used for the study. Structured close-ended interviewer-administered questionnaire was used to collect primary data. Multistage stratified technique was used to sample views from 380 students from Takoradi Polytechnic, Ghana. A Hierarchical multiple regression model was used to ascertain the significance of certain predictors of sexual behavior and practices. The variables that were extracted from the multiple regression were; for the constant; Beta=14.202, t=2.279, p=0.023, variable is significant; for the marital status; Beta=0.092, t=1.996, p<0.05, variable is significant; for the knowledge on AIDs; Beta=0.090, t=1.996, p<0.05, variable is significant; for the attitude towards HIV/AIDs; =0.486, t=10.575, p<0.001, variable is highly significant. Thus, the best fitting model for predicting behavior and sexual practices was a linear combination of the constant, one's marital status, knowledge on HIV/AIDs and Attitude towards HIV/AIDs., Y(Behavior and sexual practies)= Beta0+Beta1(Marital status)+Beta2(Knowledge on HIV/AIDs issues)+Beta3(Attitude towards HIV/AIDs issues) Beta0, Beta1, Beta2 and Beta3 are respectively 14.201, 2.038, 0.148 and 0.486; the higher the better. Attitude and behavior change education on HIV/AIDs should be intensified in the institution so that students could adopt better lifestyles.

  5. The Use of Multiple Regression and Trend Analysis to Understand Enrollment Fluctuations. AIR Forum 1979 Paper.

    ERIC Educational Resources Information Center

    Campbell, S. Duke; Greenberg, Barry

    The development of a predictive equation capable of explaining a significant percentage of enrollment variability at Florida International University is described. A model utilizing trend analysis and a multiple regression approach to enrollment forecasting was adapted to investigate enrollment dynamics at the university. Four independent…

  6. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression.

    PubMed

    Beckstead, Jason W

    2012-03-30

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic strategy to isolate, examine, and remove suppression effects has been offered. In this article such an approach, rooted in confirmatory factor analysis theory and employing matrix algebra, is developed. Suppression is viewed as the result of criterion-irrelevant variance operating among predictors. Decomposition of predictor variables into criterion-relevant and criterion-irrelevant components using structural equation modeling permits derivation of regression weights with the effects of criterion-irrelevant variance omitted. Three examples with data from applied research are used to illustrate the approach: the first assesses child and parent characteristics to explain why some parents of children with obsessive-compulsive disorder accommodate their child's compulsions more so than do others, the second examines various dimensions of personal health to explain individual differences in global quality of life among patients following heart surgery, and the third deals with quantifying the relative importance of various aptitudes for explaining academic performance in a sample of nursing students. The approach is offered as an analytic tool for investigators interested in understanding predictor-criterion relationships when complex patterns of intercorrelation among predictors are present and is shown to augment dominance analysis.

  7. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for

  8. Single versus multiple sets of resistance exercise: a meta-regression.

    PubMed

    Krieger, James W

    2009-09-01

    There has been considerable debate over the optimal number of sets per exercise to improve musculoskeletal strength during a resistance exercise program. The purpose of this study was to use hierarchical, random-effects meta-regression to compare the effects of single and multiple sets per exercise on dynamic strength. English-language studies comparing single with multiple sets per exercise, while controlling for other variables, were considered eligible for inclusion. The analysis comprised 92 effect sizes (ESs) nested within 30 treatment groups and 14 studies. Multiple sets were associated with a larger ES than a single set (difference = 0.26 +/- 0.05; confidence interval [CI]: 0.15, 0.37; p < 0.0001). In a dose-response model, 2 to 3 sets per exercise were associated with a significantly greater ES than 1 set (difference = 0.25 +/- 0.06; CI: 0.14, 0.37; p = 0.0001). There was no significant difference between 1 set per exercise and 4 to 6 sets per exercise (difference = 0.35 +/- 0.25; CI: -0.05, 0.74; p = 0.17) or between 2 to 3 sets per exercise and 4 to 6 sets per exercise (difference = 0.09 +/- 0.20; CI: -0.31, 0.50; p = 0.64). There were no interactions between set volume and training program duration, subject training status, or whether the upper or lower body was trained. Sensitivity analysis revealed no highly influential studies, and no evidence of publication bias was observed. In conclusion, 2 to 3 sets per exercise are associated with 46% greater strength gains than 1 set, in both trained and untrained subjects.

  9. Applied Multiple Linear Regression: A General Research Strategy

    ERIC Educational Resources Information Center

    Smith, Brandon B.

    1969-01-01

    Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)

  10. Evaluation of the comprehensive palatability of Japanese sake paired with dishes by multiple regression analysis based on subdomains.

    PubMed

    Nakamura, Ryo; Nakano, Kumiko; Tamura, Hiroyasu; Mizunuma, Masaki; Fushiki, Tohru; Hirata, Dai

    2017-08-01

    Many factors contribute to palatability. In order to evaluate the palatability of Japanese alcohol sake paired with certain dishes by integrating multiple factors, here we applied an evaluation method previously reported for palatability of cheese by multiple regression analysis based on 3 subdomain factors (rewarding, cultural, and informational). We asked 94 Japanese participants/subjects to evaluate the palatability of sake (1st evaluation/E1 for the first cup, 2nd/E2 and 3rd/E3 for the palatability with aftertaste/afterglow of certain dishes) and to respond to a questionnaire related to 3 subdomains. In E1, 3 factors were extracted by a factor analysis, and the subsequent multiple regression analyses indicated that the palatability of sake was interpreted by mainly the rewarding. Further, the results of attribution-dissections in E1 indicated that 2 factors (rewarding and informational) contributed to the palatability. Finally, our results indicated that the palatability of sake was influenced by the dish eaten just before drinking.

  11. Retro-regression--another important multivariate regression improvement.

    PubMed

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  12. What Is Wrong with ANOVA and Multiple Regression? Analyzing Sentence Reading Times with Hierarchical Linear Models

    ERIC Educational Resources Information Center

    Richter, Tobias

    2006-01-01

    Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…

  13. Physical and Cognitive-Affective Factors Associated with Fatigue in Individuals with Fibromyalgia: A Multiple Regression Analysis

    ERIC Educational Resources Information Center

    Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong

    2015-01-01

    Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…

  14. Determining the Spatial and Seasonal Variability in OM/OC Ratios across the U.S. Using Multiple Regression

    EPA Science Inventory

    Data from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network are used to estimate organic mass to organic carbon (OM/OC) ratios across the United States by extending previously published multiple regression techniques. Our new methodology addresses com...

  15. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    PubMed

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  16. Stepwise multiple regression method of greenhouse gas emission modeling in the energy sector in Poland.

    PubMed

    Kolasa-Wiecek, Alicja

    2015-04-01

    The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. Copyright © 2015. Published by Elsevier B.V.

  17. Confidence intervals for distinguishing ordinal and disordinal interactions in multiple regression.

    PubMed

    Lee, Sunbok; Lei, Man-Kit; Brody, Gene H

    2015-06-01

    Distinguishing between ordinal and disordinal interaction in multiple regression is useful in testing many interesting theoretical hypotheses. Because the distinction is made based on the location of a crossover point of 2 simple regression lines, confidence intervals of the crossover point can be used to distinguish ordinal and disordinal interactions. This study examined 2 factors that need to be considered in constructing confidence intervals of the crossover point: (a) the assumption about the sampling distribution of the crossover point, and (b) the possibility of abnormally wide confidence intervals for the crossover point. A Monte Carlo simulation study was conducted to compare 6 different methods for constructing confidence intervals of the crossover point in terms of the coverage rate, the proportion of true values that fall to the left or right of the confidence intervals, and the average width of the confidence intervals. The methods include the reparameterization, delta, Fieller, basic bootstrap, percentile bootstrap, and bias-corrected accelerated bootstrap methods. The results of our Monte Carlo simulation study suggest that statistical inference using confidence intervals to distinguish ordinal and disordinal interaction requires sample sizes more than 500 to be able to provide sufficiently narrow confidence intervals to identify the location of the crossover point. (c) 2015 APA, all rights reserved).

  18. Obtaining eigensolutions for multiple frequency ranges in a single NASTRAN execution

    NASA Technical Reports Server (NTRS)

    Pamidi, P. R.; Brown, W. K.

    1990-01-01

    A novel and general procedure for obtaining eigenvalues and eigenvectors for multiple frequency ranges in a single NASTRAN execution is presented. The scheme is applicable to normal modes analyzes employing the FEER and Inverse Power methods of eigenvalue extraction. The procedure is illustrated by examples.

  19. Development of Multiple Regression Equations To Predict Fourth Graders' Achievement in Reading and Selected Content Areas.

    ERIC Educational Resources Information Center

    Hafner, Lawrence E.

    A study developed a multiple regression prediction equation for each of six selected achievement variables in a popular standardized test of achievement. Subjects, 42 fourth-grade pupils randomly selected across several classes in a large elementary school in a north Florida city, were administered several standardized tests to determine predictor…

  20. Latent Variable Regression 4-Level Hierarchical Model Using Multisite Multiple-Cohorts Longitudinal Data. CRESST Report 801

    ERIC Educational Resources Information Center

    Choi, Kilchan

    2011-01-01

    This report explores a new latent variable regression 4-level hierarchical model for monitoring school performance over time using multisite multiple-cohorts longitudinal data. This kind of data set has a 4-level hierarchical structure: time-series observation nested within students who are nested within different cohorts of students. These…

  1. Obtaining Predictions from Models Fit to Multiply Imputed Data

    ERIC Educational Resources Information Center

    Miles, Andrew

    2016-01-01

    Obtaining predictions from regression models fit to multiply imputed data can be challenging because treatments of multiple imputation seldom give clear guidance on how predictions can be calculated, and because available software often does not have built-in routines for performing the necessary calculations. This research note reviews how…

  2. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    PubMed

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  3. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg

  4. Model selection with multiple regression on distance matrices leads to incorrect inferences.

    PubMed

    Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H

    2017-01-01

    In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.

  5. The Use of Multiple Regression Models to Determine if Conjoint Analysis Should Be Conducted on Aggregate Data.

    ERIC Educational Resources Information Center

    Fraas, John W.; Newman, Isadore

    1996-01-01

    In a conjoint-analysis consumer-preference study, researchers must determine whether the product factor estimates, which measure consumer preferences, should be calculated and interpreted for each respondent or collectively. Multiple regression models can determine whether to aggregate data by examining factor-respondent interaction effects. This…

  6. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

    NASA Astrophysics Data System (ADS)

    Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.

    2013-10-01

    In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.

  7. The extraction of simple relationships in growth factor-specific multiple-input and multiple-output systems in cell-fate decisions by backward elimination PLS regression.

    PubMed

    Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya

    2013-01-01

    Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.

  8. Using Spatial Multiple Regression to Identify Intrinsic Connectivity Networks Involved in Working Memory Performance

    PubMed Central

    Gordon, Evan M.; Stollstorff, Melanie; Vaidya, Chandan J.

    2012-01-01

    Many researchers have noted that the functional architecture of the human brain is relatively invariant during task performance and the resting state. Indeed, intrinsic connectivity networks (ICNs) revealed by resting-state functional connectivity analyses are spatially similar to regions activated during cognitive tasks. This suggests that patterns of task-related activation in individual subjects may result from the engagement of one or more of these ICNs; however, this has not been tested. We used a novel analysis, spatial multiple regression, to test whether the patterns of activation during an N-back working memory task could be well described by a linear combination of ICNs delineated using Independent Components Analysis at rest. We found that across subjects, the cingulo-opercular Set Maintenance ICN, as well as right and left Frontoparietal Control ICNs, were reliably activated during working memory, while Default Mode and Visual ICNs were reliably deactivated. Further, involvement of Set Maintenance, Frontoparietal Control, and Dorsal Attention ICNs was sensitive to varying working memory load. Finally, the degree of left Frontoparietal Control network activation predicted response speed, while activation in both left Frontoparietal Control and Dorsal Attention networks predicted task accuracy. These results suggest that a close relationship between resting-state networks and task-evoked activation is functionally relevant for behavior, and that spatial multiple regression analysis is a suitable method for revealing that relationship. PMID:21761505

  9. Statistical experiments using the multiple regression research for prediction of proper hardness in areas of phosphorus cast-iron brake shoes manufacturing

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.

    2018-01-01

    Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.

  10. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  11. Screening for ketosis using multiple logistic regression based on milk yield and composition

    PubMed Central

    KAYANO, Mitsunori; KATAOKA, Tomoko

    2015-01-01

    Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF − 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively. PMID:26074408

  12. Screening for ketosis using multiple logistic regression based on milk yield and composition.

    PubMed

    Kayano, Mitsunori; Kataoka, Tomoko

    2015-11-01

    Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF - 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.

  13. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    NASA Astrophysics Data System (ADS)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust

  14. Practical Session: Multiple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).

  15. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    PubMed

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore

  16. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  17. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    PubMed

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  18. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  19. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  20. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    USGS Publications Warehouse

    Ohlmacher, G.C.; Davis, J.C.

    2003-01-01

    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  1. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  2. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study

    PubMed Central

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-01-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037

  3. Estimation of streamflow, base flow, and nitrate-nitrogen loads in Iowa using multiple linear regression models

    USGS Publications Warehouse

    Schilling, K.E.; Wolter, C.F.

    2005-01-01

    Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).

  4. Use of principal-component, correlation, and stepwise multiple-regression analyses to investigate selected physical and hydraulic properties of carbonate-rock aquifers

    USGS Publications Warehouse

    Brown, C. Erwin

    1993-01-01

    Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.

  5. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  6. Multiple pharmacy use and types of pharmacies used to obtain prescriptions.

    PubMed

    Look, Kevin A; Mott, David A

    2013-01-01

    To evaluate trends and patterns in the prevalence of multiple pharmacy use (MPU) and to describe the number and types of pharmacies used by multiple pharmacy users from 2003 to 2009. Retrospective, cross-sectional, descriptive study. United States from 2003 to 2009. 89,941 responses to the Medical Expenditure Panel Survey over 7 years. Analysis of respondent pharmacy use behaviors. Annual use of more than one pharmacy and number and types of pharmacies used. MPU among patients using medications increased significantly during the study period (from 36.4% [95% CI 35.2-37.6] in 2003 to 43.2% [41.9-44.4] in 2009)-a relative increase of 18.7% ( P = 0.01). Multiple pharmacy users used between 2 and 17 different pharmacies per year to obtain prescription medications. Although approximately 70% of multiple pharmacy users used only two pharmacies, the proportion using three or more pharmacies increased from 24.1% (22.5-25.7) in 2003 to 29.1% (27.4-30.8) in 2009. Mail service pharmacy use had the largest relative increase among multiple pharmacy users during the study period (27.2%), and MPU was nearly twice as high (75%) among mail service users compared with non-mail service users. MPU is common on a national level and has increased greatly in recent years. Patient use of pharmacies that have the potential to share medication information electronically is low among multiple pharmacy users, suggesting increased workload for pharmacists and potential medication safety concerns. This has important implications for pharmacists, as it potentially impedes their ability to maintain accurate medication profiles for patients.

  7. Multiple regression analysis of anthropometric measurements influencing the cephalic index of male Japanese university students.

    PubMed

    Hossain, Md Golam; Saw, Aik; Alam, Rashidul; Ohtsuki, Fumio; Kamarul, Tunku

    2013-09-01

    Cephalic index (CI), the ratio of head breadth to head length, is widely used to categorise human populations. The aim of this study was to access the impact of anthropometric measurements on the CI of male Japanese university students. This study included 1,215 male university students from Tokyo and Kyoto, selected using convenient sampling. Multiple regression analysis was used to determine the effect of anthropometric measurements on CI. The variance inflation factor (VIF) showed no evidence of a multicollinearity problem among independent variables. The coefficients of the regression line demonstrated a significant positive relationship between CI and minimum frontal breadth (p < 0.01), bizygomatic breadth (p < 0.01) and head height (p < 0.05), and a negative relationship between CI and morphological facial height (p < 0.01) and head circumference (p < 0.01). Moreover, the coefficient and odds ratio of logistic regression analysis showed a greater likelihood for minimum frontal breadth (p < 0.01) and bizygomatic breadth (p < 0.01) to predict round-headedness, and morphological facial height (p < 0.05) and head circumference (p < 0.01) to predict long-headedness. Stepwise regression analysis revealed bizygomatic breadth, head circumference, minimum frontal breadth, head height and morphological facial height to be the best predictor craniofacial measurements with respect to CI. The results suggest that most of the variables considered in this study appear to influence the CI of adult male Japanese students.

  8. A non-linear regression analysis program for describing electrophysiological data with multiple functions using Microsoft Excel.

    PubMed

    Brown, Angus M

    2006-04-01

    The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.

  9. Stepwise versus Hierarchical Regression: Pros and Cons

    ERIC Educational Resources Information Center

    Lewis, Mitzi

    2007-01-01

    Multiple regression is commonly used in social and behavioral data analysis. In multiple regression contexts, researchers are very often interested in determining the "best" predictors in the analysis. This focus may stem from a need to identify those predictors that are supportive of theory. Alternatively, the researcher may simply be interested…

  10. A Constrained Linear Estimator for Multiple Regression

    ERIC Educational Resources Information Center

    Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

    2010-01-01

    "Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…

  11. Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions

    PubMed Central

    Fernandes, Bruno J. T.; Roque, Alexandre

    2018-01-01

    Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366

  12. Assessing risk factors for periodontitis using regression

    NASA Astrophysics Data System (ADS)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  13. Regression Analysis: Legal Applications in Institutional Research

    ERIC Educational Resources Information Center

    Frizell, Julie A.; Shippen, Benjamin S., Jr.; Luna, Andrew L.

    2008-01-01

    This article reviews multiple regression analysis, describes how its results should be interpreted, and instructs institutional researchers on how to conduct such analyses using an example focused on faculty pay equity between men and women. The use of multiple regression analysis will be presented as a method with which to compare salaries of…

  14. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari

    2009-11-15

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R{sup 2} were used to evaluate performancemore » of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R{sup 2} confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.« less

  15. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    ERIC Educational Resources Information Center

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  16. Testing a single regression coefficient in high dimensional linear models

    PubMed Central

    Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2017-01-01

    In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668

  17. Testing a single regression coefficient in high dimensional linear models.

    PubMed

    Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2016-11-01

    In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.

  18. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    DTIC Science & Technology

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  19. Factors associated with preventable infant death: a multiple logistic regression.

    PubMed

    Vidal E Silva, Sandra Maria Cunha; Tuon, Rogério Antonio; Probst, Livia Fernandes; Gondinho, Brunna Verna Castro; Pereira, Antonio Carlos; Meneghim, Marcelo de Castro; Cortellazzi, Karine Laura; Ambrosano, Glaucia Maria Bovi

    2018-01-01

    OBJECTIVE To identify and analyze factors associated with preventable child deaths. METHODS This analytical cross-sectional study had preventable child mortality as dependent variable. From a population of 34,284 live births, we have selected a systematic sample of 4,402 children who did not die compared to 272 children who died from preventable causes during the period studied. The independent variables were analyzed in four hierarchical blocks: sociodemographic factors, the characteristics of the mother, prenatal and delivery care, and health conditions of the patient and neonatal care. We performed a descriptive statistical analysis and estimated multiple hierarchical logistic regression models. RESULTS Approximatelly 35.3% of the deaths could have been prevented with the early diagnosis and treatment of diseases during pregnancy and 26.8% of them could have been prevented with better care conditions for pregnant women. CONCLUSIONS The following characteristics of the mother are determinant for the higher mortality of children before the first year of life: living in neighborhoods with an average family income lower than four minimum wages, being aged ≤ 19 years, having one or more alive children, having a child with low APGAR level at the fifth minute of life, and having a child with low birth weight.

  20. Factors associated with preventable infant death: a multiple logistic regression

    PubMed Central

    Vidal e Silva, Sandra Maria Cunha; Tuon, Rogério Antonio; Probst, Livia Fernandes; Gondinho, Brunna Verna Castro; Pereira, Antonio Carlos; Meneghim, Marcelo de Castro; Cortellazzi, Karine Laura; Ambrosano, Glaucia Maria Bovi

    2018-01-01

    ABSTRACT OBJECTIVE To identify and analyze factors associated with preventable child deaths. METHODS This analytical cross-sectional study had preventable child mortality as dependent variable. From a population of 34,284 live births, we have selected a systematic sample of 4,402 children who did not die compared to 272 children who died from preventable causes during the period studied. The independent variables were analyzed in four hierarchical blocks: sociodemographic factors, the characteristics of the mother, prenatal and delivery care, and health conditions of the patient and neonatal care. We performed a descriptive statistical analysis and estimated multiple hierarchical logistic regression models. RESULTS Approximatelly 35.3% of the deaths could have been prevented with the early diagnosis and treatment of diseases during pregnancy and 26.8% of them could have been prevented with better care conditions for pregnant women. CONCLUSIONS The following characteristics of the mother are determinant for the higher mortality of children before the first year of life: living in neighborhoods with an average family income lower than four minimum wages, being aged ≤ 19 years, having one or more alive children, having a child with low APGAR level at the fifth minute of life, and having a child with low birth weight. PMID:29723389

  1. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

    PubMed Central

    Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.

    2014-01-01

    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601

  2. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  3. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

    PubMed Central

    Zainudin, Suhaila; Arif, Shereena M.

    2017-01-01

    Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5. PMID:28250767

  4. Regression in autistic spectrum disorders.

    PubMed

    Stefanatos, Gerry A

    2008-12-01

    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  5. Precision Efficacy Analysis for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.

    When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…

  6. Multiple regression equations modelling of groundwater of Ajmer-Pushkar railway line region, Rajasthan (India).

    PubMed

    Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra

    2010-01-01

    In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.

  7. Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

    PubMed

    Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

    2012-02-01

    In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.

  8. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials

    PubMed Central

    Hu, L.; Zhang, Z.G.; Mouraux, A.; Iannetti, G.D.

    2015-01-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  9. Detection of epistatic effects with logic regression and a classical linear regression model.

    PubMed

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  10. The use of regression analysis in determining reference intervals for low hematocrit and thrombocyte count in multiple electrode aggregometry and platelet function analyzer 100 testing of platelet function.

    PubMed

    Kuiper, Gerhardus J A J M; Houben, Rik; Wetzels, Rick J H; Verhezen, Paul W M; Oerle, Rene van; Ten Cate, Hugo; Henskens, Yvonne M C; Lancé, Marcus D

    2017-11-01

    Low platelet counts and hematocrit levels hinder whole blood point-of-care testing of platelet function. Thus far, no reference ranges for MEA (multiple electrode aggregometry) and PFA-100 (platelet function analyzer 100) devices exist for low ranges. Through dilution methods of volunteer whole blood, platelet function at low ranges of platelet count and hematocrit levels was assessed on MEA for four agonists and for PFA-100 in two cartridges. Using (multiple) regression analysis, 95% reference intervals were computed for these low ranges. Low platelet counts affected MEA in a positive correlation (all agonists showed r 2 ≥ 0.75) and PFA-100 in an inverse correlation (closure times were prolonged with lower platelet counts). Lowered hematocrit did not affect MEA testing, except for arachidonic acid activation (ASPI), which showed a weak positive correlation (r 2 = 0.14). Closure time on PFA-100 testing was inversely correlated with hematocrit for both cartridges. Regression analysis revealed different 95% reference intervals in comparison with originally established intervals for both MEA and PFA-100 in low platelet or hematocrit conditions. Multiple regression analysis of ASPI and both tests on the PFA-100 for combined low platelet and hematocrit conditions revealed that only PFA-100 testing should be adjusted for both thrombocytopenia and anemia. 95% reference intervals were calculated using multiple regression analysis. However, coefficients of determination of PFA-100 were poor, and some variance remained unexplained. Thus, in this pilot study using (multiple) regression analysis, we could establish reference intervals of platelet function in anemia and thrombocytopenia conditions on PFA-100 and in thrombocytopenia conditions on MEA.

  11. Selection of higher order regression models in the analysis of multi-factorial transcription data.

    PubMed

    Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim

    2014-01-01

    Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

  12. Melanin and blood concentration in human skin studied by multiple regression analysis: experiments

    NASA Astrophysics Data System (ADS)

    Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.

    2001-09-01

    Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.

  13. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  14. Spectral regression and correlation coefficients of some benzaldimines and salicylaldimines in different solvents

    NASA Astrophysics Data System (ADS)

    Hammud, Hassan H.; Ghannoum, Amer; Masoud, Mamdouh S.

    2006-02-01

    Sixteen Schiff bases obtained from the condensation of benzaldehyde or salicylaldehyde with various amines (aniline, 4-carboxyaniline, phenylhydrazine, 2,4-dinitrophenylhydrazine, ethylenediamine, hydrazine, o-phenylenediamine and 2,6-pyridinediamine) are studied with UV-vis spectroscopy to observe the effect of solvents, substituents and other structural factors on the spectra. The bands involving different electronic transitions are interpreted. Computerized analysis and multiple regression techniques were applied to calculate the regression and correlation coefficients based on the equation that relates peak position λmax to the solvent parameters that depend on the H-bonding ability, refractive index and dielectric constant of solvents.

  15. Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter

    ERIC Educational Resources Information Center

    Ali, Syed Haris; Carr, Patrick A.; Ruit, Kenneth G.

    2016-01-01

    Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Freeresponse (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical…

  16. Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

    PubMed

    Golmohammadi, Hassan

    2009-11-30

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.

  17. Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

    NASA Astrophysics Data System (ADS)

    Leroux, Romain; Chatellier, Ludovic; David, Laurent

    2018-01-01

    This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.

  18. Brain networks of temporal preparation: A multiple regression analysis of neuropsychological data.

    PubMed

    Triviño, Mónica; Correa, Ángel; Lupiáñez, Juan; Funes, María Jesús; Catena, Andrés; He, Xun; Humphreys, Glyn W

    2016-11-15

    There are only a few studies on the brain networks involved in the ability to prepare in time, and most of them followed a correlational rather than a neuropsychological approach. The present neuropsychological study performed multiple regression analysis to address the relationship between both grey and white matter (measured by magnetic resonance imaging in patients with brain lesion) and different effects in temporal preparation (Temporal orienting, Foreperiod and Sequential effects). Two versions of a temporal preparation task were administered to a group of 23 patients with acquired brain injury. In one task, the cue presented (a red versus green square) to inform participants about the time of appearance (early versus late) of a target stimulus was blocked, while in the other task the cue was manipulated on a trial-by-trial basis. The duration of the cue-target time intervals (400 versus 1400ms) was always manipulated within blocks in both tasks. Regression analysis were conducted between either the grey matter lesion size or the white matter tracts disconnection and the three temporal preparation effects separately. The main finding was that each temporal preparation effect was predicted by a different network of structures, depending on cue expectancy. Specifically, the Temporal orienting effect was related to both prefrontal and temporal brain areas. The Foreperiod effect was related to right and left prefrontal structures. Sequential effects were predicted by both parietal cortex and left subcortical structures. These findings show a clear dissociation of brain circuits involved in the different ways to prepare in time, showing for the first time the involvement of temporal areas in the Temporal orienting effect, as well as the parietal cortex in the Sequential effects. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  20. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  1. Prediction of random-regression coefficient for daily milk yield after 305 days in milk by using the regression-coefficient estimates from the first 305 days.

    PubMed

    Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu

    2018-03-13

    Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.

  2. An Effect Size for Regression Predictors in Meta-Analysis

    ERIC Educational Resources Information Center

    Aloe, Ariel M.; Becker, Betsy Jane

    2012-01-01

    A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…

  3. Integrating multiple fitting regression and Bayes decision for cancer diagnosis with transcriptomic data from tumor-educated blood platelets.

    PubMed

    Huang, Guangzao; Yuan, Mingshun; Chen, Moliang; Li, Lei; You, Wenjie; Li, Hanjie; Cai, James J; Ji, Guoli

    2017-10-07

    The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.

  4. Practical Session: Simple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).

  5. Sample size determination for logistic regression on a logit-normal distribution.

    PubMed

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  6. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  7. Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

    ERIC Educational Resources Information Center

    Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.

    2012-01-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…

  8. Relationship between postoperative refractive outcomes and cataract density: multiple regression analysis.

    PubMed

    Ueda, Tetsuo; Ikeda, Hitoe; Ota, Takeo; Matsuura, Toyoaki; Hara, Yoshiaki

    2010-05-01

    To evaluate the relationship between cataract density and the deviation from the predicted refraction. Department of Ophthalmology, Nara Medical University, Kashihara, Japan. Axial length (AL) was measured in eyes with mainly nuclear cataract using partial coherence interferometry (IOLMaster). The postoperative AL was measured in pseudophakic mode. The AL difference was calculated by subtracting the postoperative AL from the preoperative AL. Cataract density was measured with the pupil dilated using anterior segment Scheimpflug imaging (EAS-1000). The predicted postoperative refraction was calculated using the SRK/T formula. The subjective refraction 3 months postoperatively was also measured. The mean absolute prediction error (MAE) (mean of absolute difference between predicted postoperative refraction and spherical equivalent of postoperative subjective refraction) was calculated. The relationship between the MAE and cataract density, age, preoperative visual acuity, anterior chamber depth, corneal radius of curvature, and AL difference was evaluated using multiple regression analysis. In the 96 eyes evaluated, the MAE was correlated with cataract density (r = 0.37, P = .001) and the AL difference (r = 0.34, P = .003) but not with the other parameters. The AL difference was correlated with cataract density (r = 0.53, P<.0001). The postoperative refractive outcome was affected by cataract density. This should be taken into consideration in eyes with a higher density cataract. (c) 2010 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.

  9. Combinations of Multiple Neuroimaging Markers using Logistic Regression for Auxiliary Diagnosis of Alzheimer Disease and Mild Cognitive Impairment.

    PubMed

    Mao, Nini; Liu, Yunting; Chen, Kewei; Yao, Li; Wu, Xia

    2018-06-05

    Multiple neuroimaging modalities have been developed providing various aspects of information on the human brain. Used together and properly, these complementary multimodal neuroimaging data integrate multisource information which can facilitate a diagnosis and improve the diagnostic accuracy. In this study, 3 types of brain imaging data (sMRI, FDG-PET, and florbetapir-PET) were fused in the hope to improve diagnostic accuracy, and multivariate methods (logistic regression) were applied to these trimodal neuroimaging indices. Then, the receiver-operating characteristic (ROC) method was used to analyze the outcomes of the logistic classifier, with either each index, multiples from each modality, or all indices from all 3 modalities, to investigate their differential abilities to identify the disease. With increasing numbers of indices within each modality and across modalities, the accuracy of identifying Alzheimer disease (AD) increases to varying degrees. For example, the area under the ROC curve is above 0.98 when all the indices from the 3 imaging data types are combined. Using a combination of different indices, the results confirmed the initial hypothesis that different biomarkers were potentially complementary, and thus the conjoint analysis of multiple information from multiple sources would improve the capability to identify diseases such as AD and mild cognitive impairment. © 2018 S. Karger AG, Basel.

  10. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  11. Risk Assessment and Prediction of Flyrock Distance by Combined Multiple Regression Analysis and Monte Carlo Simulation of Quarry Blasting

    NASA Astrophysics Data System (ADS)

    Armaghani, Danial Jahed; Mahdiyar, Amir; Hasanipanah, Mahdi; Faradonbeh, Roohollah Shirani; Khandelwal, Manoj; Amnieh, Hassan Bakhshandeh

    2016-09-01

    Flyrock is considered as one of the main causes of human injury, fatalities, and structural damage among all undesirable environmental impacts of blasting. Therefore, it seems that the proper prediction/simulation of flyrock is essential, especially in order to determine blast safety area. If proper control measures are taken, then the flyrock distance can be controlled, and, in return, the risk of damage can be reduced or eliminated. The first objective of this study was to develop a predictive model for flyrock estimation based on multiple regression (MR) analyses, and after that, using the developed MR model, flyrock phenomenon was simulated by the Monte Carlo (MC) approach. In order to achieve objectives of this study, 62 blasting operations were investigated in Ulu Tiram quarry, Malaysia, and some controllable and uncontrollable factors were carefully recorded/calculated. The obtained results of MC modeling indicated that this approach is capable of simulating flyrock ranges with a good level of accuracy. The mean of simulated flyrock by MC was obtained as 236.3 m, while this value was achieved as 238.6 m for the measured one. Furthermore, a sensitivity analysis was also conducted to investigate the effects of model inputs on the output of the system. The analysis demonstrated that powder factor is the most influential parameter on fly rock among all model inputs. It is noticeable that the proposed MR and MC models should be utilized only in the studied area and the direct use of them in the other conditions is not recommended.

  12. Modeling the energy content of combustible ship-scrapping waste at Alang-Sosiya, India, using multiple regression analysis.

    PubMed

    Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K

    2005-01-01

    Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste.

  13. Multiple regression analysis of factors influencing dominant hand grip strength in an adult Malaysian population.

    PubMed

    Hossain, M G; Zyroul, R; Pereira, B P; Kamarul, T

    2012-01-01

    Grip strength is an important measure used to monitor the progression of a condition, and to evaluate outcomes of treatment. We assessed how various physical and social factors predict normal grip strength in an adult Malaysian population of mixed Asian ethnicity (254 men, 246 women). Grip strength was recorded using the Jamar dynamometer. The mean grip strength for the dominant hand was 29.8 kg for men and 17.6 kg for women. Multiple regression analysis demonstrated that the dominant hand grip strength was positively associated with height and body mass index, and negatively associated with age for both sexes. Dominant hand grip strength was related to work status for men (p < 0.05) but not for women. However, there was no difference in grip strength among ethnic groups.

  14. Ridge: a computer program for calculating ridge regression estimates

    Treesearch

    Donald E. Hilt; Donald W. Seegrist

    1977-01-01

    Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.

  15. Sequential Processing and the Matching-Stimulus Interval Effect in ERP Components: An Exploration of the Mechanism Using Multiple Regression

    PubMed Central

    Steiner, Genevieve Z.; Barry, Robert J.; Gonsalvez, Craig J.

    2016-01-01

    In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies. PMID:27445774

  16. Sequential Processing and the Matching-Stimulus Interval Effect in ERP Components: An Exploration of the Mechanism Using Multiple Regression.

    PubMed

    Steiner, Genevieve Z; Barry, Robert J; Gonsalvez, Craig J

    2016-01-01

    In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies.

  17. Estimating Optimal Transformations for Multiple Regression and Correlation.

    DTIC Science & Technology

    1982-07-01

    algorithm; we minimize (2.4) e2 (,,, ...,) = E[e(Y) - 1I (X 2 j=l j 2holding EO =1, E6 = E0, =.-. =Ecp = 0, through a series of single function minimizations...X, x = INU = lIVe . Then (5.16) THEOREM. If 6*, p* is an optimal transformation for regression, then = ue*o Conversely, if e satisfies Xe = U6, Nll1...Stanford University, Tech. Report ORIONOO6. Gasser, T. and Rosenblatt, M. (eds.) (1979). Smoothing Techniques for Curve Estimation, in Lecture Notes in

  18. Empirical predictive models of daily relativistic electron flux at geostationary orbit: Multiple regression analysis

    DOE PAGES

    Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; ...

    2016-04-07

    The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less

  19. Empirical predictive models of daily relativistic electron flux at geostationary orbit: Multiple regression analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav

    The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the predictionmore » of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). Furthermore, a path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current ( Dst), AE, and wave activity.« less

  20. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions

  1. A Modified Double Multiple Nonlinear Regression Constitutive Equation for Modeling and Prediction of High Temperature Flow Behavior of BFe10-1-2 Alloy

    NASA Astrophysics Data System (ADS)

    Cai, Jun; Wang, Kuaishe; Shi, Jiamin; Wang, Wen; Liu, Yingying

    2018-01-01

    Constitutive analysis for hot working of BFe10-1-2 alloy was carried out by using experimental stress-strain data from isothermal hot compression tests, in a wide range of temperature of 1,023 1,273 K, and strain rate range of 0.001 10 s-1. A constitutive equation based on modified double multiple nonlinear regression was proposed considering the independent effects of strain, strain rate, temperature and their interrelation. The predicted flow stress data calculated from the developed equation was compared with the experimental data. Correlation coefficient (R), average absolute relative error (AARE) and relative errors were introduced to verify the validity of the developed constitutive equation. Subsequently, a comparative study was made on the capability of strain-compensated Arrhenius-type constitutive model. The results showed that the developed constitutive equation based on modified double multiple nonlinear regression could predict flow stress of BFe10-1-2 alloy with good correlation and generalization.

  2. Evidencing the association between swimming capacities and performance indicators in water polo: a multiple regression study.

    PubMed

    Kontic, Dean; Zenic, Natasa; Uljevic, Ognjen; Sekulic, Damir; Lesnik, Blaz

    2017-06-01

    Swimming capacities are hypothesized to be important determinants of water polo performance but there is an evident lack of studies examining different swimming capacities in relation to specific offensive and defensive performance variables in this sport. The aim of this study was to determine the relationship between five swimming capacities and six performance determinants in water polo. The sample comprised 79 high-level youth water polo players (all males, 17-18 years of age). The variables included six performance-related variables (agility in offence and defense, efficacy in offence and defense, polyvalence in offence and defense), and five swimming-capacity tests (water polo sprint test [15 m], swimming sprint test [25 m], short-distance [100 m], aerobic endurance [400 m] and an anaerobic lactate endurance test [4× 50 m]). First, multiple regressions were calculated for one-half of the sample of subjects which were then validated with the remaining half of the sample. The 25-m swim was not included in the regression analyses due to the multicollinearity with other predictors. The originally calculated regression models were validated for defensive agility (R=0.67 and R=0.55 for the original regression calculation and validation subsample, respectively) offensive agility (R=0.59 and R=0.61), and offensive efficacy (R=0.64 and R=0.58). Anaerobic lactate endurance is a significant predictor of offensive and defensive agility, while 15 m sprint significantly contributes to offensive efficacy. Swimming capacities are not found to be related to the polyvalence of the players. The most superior offensive performance can be expected from those players with a high level of anaerobic lactate endurance and advanced sprinting capacity, while anaerobic lactate endurance is recognized as most important quality in defensive duties. Future studies should observe players' polyvalence in relation to (theoretical) knowledge of technical and tactical tasks. Results reinforce

  3. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  4. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  5. Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

    NASA Astrophysics Data System (ADS)

    Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

    2017-12-01

    The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.

  6. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    NASA Astrophysics Data System (ADS)

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-12-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  7. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    USGS Publications Warehouse

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-01-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  8. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

    NASA Astrophysics Data System (ADS)

    Denli, H. H.; Koc, Z.

    2015-12-01

    Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

  9. Bayesian function-on-function regression for multilevel functional data.

    PubMed

    Meyer, Mark J; Coull, Brent A; Versace, Francesco; Cinciripini, Paul; Morris, Jeffrey S

    2015-09-01

    Medical and public health research increasingly involves the collection of complex and high dimensional data. In particular, functional data-where the unit of observation is a curve or set of curves that are finely sampled over a grid-is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images. © 2015, The International Biometric Society.

  10. Association between response rates and survival outcomes in patients with newly diagnosed multiple myeloma. A systematic review and meta-regression analysis.

    PubMed

    Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos

    2017-06-01

    We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  11. Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

    PubMed

    McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

    2017-02-01

    Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.

  12. Asymptotics of nonparametric L-1 regression models with dependent data

    PubMed Central

    ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.

    2013-01-01

    We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016

  13. Development of a predictive model for lead, cadmium and fluorine soil-water partition coefficients using sparse multiple linear regression analysis.

    PubMed

    Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi

    2017-11-01

    In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Weighted regression analysis and interval estimators

    Treesearch

    Donald W. Seegrist

    1974-01-01

    A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.

  15. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  16. Computing mammographic density from a multiple regression model constructed with image-acquisition parameters from a full-field digital mammographic unit

    PubMed Central

    Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.

    2009-01-01

    Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343

  17. Quantile Regression in the Study of Developmental Sciences

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…

  18. Wavelength dependence of aerosol backscatter coefficients obtained by multiple wavelength Lidar measurements

    NASA Technical Reports Server (NTRS)

    Sasano, Y.; Browell, E. V.

    1986-01-01

    Aerosols are often classified into several general types according to their origins and composition, such as maritime, continental, and stratospheric aerosols, and these aerosol types generally have different characteristics in chemical and physical properties. The present study aims at demonstrating the potential for distinguishing these aerosol types by the wavelength dependence of their backscatter coefficients obtained from quantitative analyses of multiple wavelength lidar signals. Data from the NASA Airborne Differential Abosrption lidar (DIAL) S ystems, which can measure aerosol backscatter profiles at wavelenghts of 300, 600, and 1064 nm and ozone profiles of backscatter coefficients for these three wavelength were derived from the observations of aerosols of different types. Observations were performed over the Atlantic Ocean, the Southwestern United States, and French Guyana.

  19. QSRR modeling for the chromatographic retention behavior of some β-lactam antibiotics using forward and firefly variable selection algorithms coupled with multiple linear regression.

    PubMed

    Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M

    2018-05-11

    The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

    PubMed

    Demidenko, Eugene

    2017-09-01

    The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.

  1. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  2. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  3. Regression Commonality Analysis: A Technique for Quantitative Theory Building

    ERIC Educational Resources Information Center

    Nimon, Kim; Reio, Thomas G., Jr.

    2011-01-01

    When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…

  4. TI-59 Programs for Multiple Regression.

    DTIC Science & Technology

    1980-05-01

    general linear hypothesis model of full rank [ Graybill , 19611 can be written as Y = x 8 + C , s-N(O,o 2I) nxl nxk kxl nxl where Y is the vector of n...a "reduced model " solution, and confidence intervals for linear functions of the coefficients can be obtained using (x’x) and a2, based on the t...O107)l UA.LLL. Library ModuIe NASTER -Puter 0NTINA Cards 1 PROGRAM DESCRIPTION (s s 2 ror the general linear hypothesis model Y - XO + C’ calculates

  5. Simultaneous determination of hydroquinone, resorcinol, phenol, m-cresol and p-cresol in untreated air samples using spectrofluorimetry and a custom multiple linear regression-successive projection algorithm.

    PubMed

    Pistonesi, Marcelo F; Di Nezio, María S; Centurión, María E; Lista, Adriana G; Fragoso, Wallace D; Pontes, Márcio J C; Araújo, Mário C U; Band, Beatriz S Fernández

    2010-12-15

    In this study, a novel, simple, and efficient spectrofluorimetric method to determine directly and simultaneously five phenolic compounds (hydroquinone, resorcinol, phenol, m-cresol and p-cresol) in air samples is presented. For this purpose, variable selection by the successive projections algorithm (SPA) is used in order to obtain simple multiple linear regression (MLR) models based on a small subset of wavelengths. For comparison, partial least square (PLS) regression is also employed in full-spectrum. The concentrations of the calibration matrix ranged from 0.02 to 0.2 mg L(-1) for hydroquinone, from 0.05 to 0.6 mg L(-1) for resorcinol, and from 0.05 to 0.4 mg L(-1) for phenol, m-cresol and p-cresol; incidentally, such ranges are in accordance with the Argentinean environmental legislation. To verify the accuracy of the proposed method a recovery study on real air samples of smoking environment was carried out with satisfactory results (94-104%). The advantage of the proposed method is that it requires only spectrofluorimetric measurements of samples and chemometric modeling for simultaneous determination of five phenols. With it, air is simply sampled and no pre-treatment sample is needed (i.e., separation steps and derivatization reagents are avoided) that means a great saving of time. Copyright © 2010 Elsevier B.V. All rights reserved.

  6. Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

    ERIC Educational Resources Information Center

    Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.

    2010-01-01

    Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…

  7. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis☆

    PubMed Central

    Greensmith, David J.

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. PMID:24125908

  8. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  9. Investigating bias in squared regression structure coefficients

    PubMed Central

    Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

    2015-01-01

    The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273

  10. Quantile Regression in the Study of Developmental Sciences

    PubMed Central

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596

  11. CAHOST: An Excel Workbook for Facilitating the Johnson-Neyman Technique for Two-Way Interactions in Multiple Regression.

    PubMed

    Carden, Stephen W; Holtzman, Nicholas S; Strube, Michael J

    2017-01-01

    When using multiple regression, researchers frequently wish to explore how the relationship between two variables is moderated by another variable; this is termed an interaction. Historically, two approaches have been used to probe interactions: the pick-a-point approach and the Johnson-Neyman (JN) technique. The pick-a-point approach has limitations that can be avoided using the JN technique. Currently, the software available for implementing the JN technique and creating corresponding figures lacks several desirable features-most notably, ease of use and figure quality. To fill this gap in the literature, we offer a free Microsoft Excel 2013 workbook, CAHOST (a concatenation of the first two letters of the authors' last names), that allows the user to seamlessly create publication-ready figures of the results of the JN technique.

  12. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    NASA Technical Reports Server (NTRS)

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  13. Linkage mapping of beta 2 EEG waves via non-parametric regression.

    PubMed

    Ghosh, Saurabh; Begleiter, Henri; Porjesz, Bernice; Chorlian, David B; Edenberg, Howard J; Foroud, Tatiana; Goate, Alison; Reich, Theodore

    2003-04-01

    Parametric linkage methods for analyzing quantitative trait loci are sensitive to violations in trait distributional assumptions. Non-parametric methods are relatively more robust. In this article, we modify the non-parametric regression procedure proposed by Ghosh and Majumder [2000: Am J Hum Genet 66:1046-1061] to map Beta 2 EEG waves using genome-wide data generated in the COGA project. Significant linkage findings are obtained on chromosomes 1, 4, 5, and 15 with findings at multiple regions on chromosomes 4 and 15. We analyze the data both with and without incorporating alcoholism as a covariate. We also test for epistatic interactions between regions of the genome exhibiting significant linkage with the EEG phenotypes and find evidence of epistatic interactions between a region each on chromosome 1 and chromosome 4 with one region on chromosome 15. While regressing out the effect of alcoholism does not affect the linkage findings, the epistatic interactions become statistically insignificant. Copyright 2003 Wiley-Liss, Inc.

  14. Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis

    ERIC Educational Resources Information Center

    Kim, Rae Seon

    2011-01-01

    When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…

  15. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.

    PubMed

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2015-11-18

    Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.

  16. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method

    PubMed Central

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2016-01-01

    Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889

  17. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis

    NASA Astrophysics Data System (ADS)

    Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  18. Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.

    PubMed

    Choi, Jae-Seok; Kim, Munchurl

    2017-03-01

    Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower

  19. Influence diagnostics in meta-regression model.

    PubMed

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  20. Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

    PubMed

    Fu, Liya; Wang, You-Gan

    2011-02-15

    Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.

  1. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    PubMed

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  2. Cactus: An Introduction to Regression

    ERIC Educational Resources Information Center

    Hyde, Hartley

    2008-01-01

    When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…

  3. Ca analysis: an Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis.

    PubMed

    Greensmith, David J

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. Copyright © 2013 The Author. Published by Elsevier Ireland Ltd.. All rights reserved.

  4. Interquantile Shrinkage in Regression Models

    PubMed Central

    Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.

    2012-01-01

    Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546

  5. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  6. Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression

    NASA Astrophysics Data System (ADS)

    Beh, W. L.; Yong, M. K. Au

    2017-04-01

    This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.

  7. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    NASA Astrophysics Data System (ADS)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  8. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  9. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  10. Multiple injections of electroporated autologous T cells expressing a chimeric antigen receptor mediate regression of human disseminated tumor.

    PubMed

    Zhao, Yangbing; Moon, Edmund; Carpenito, Carmine; Paulos, Chrystal M; Liu, Xiaojun; Brennan, Andrea L; Chew, Anne; Carroll, Richard G; Scholler, John; Levine, Bruce L; Albelda, Steven M; June, Carl H

    2010-11-15

    Redirecting T lymphocyte antigen specificity by gene transfer can provide large numbers of tumor-reactive T lymphocytes for adoptive immunotherapy. However, safety concerns associated with viral vector production have limited clinical application of T cells expressing chimeric antigen receptors (CAR). T lymphocytes can be gene modified by RNA electroporation without integration-associated safety concerns. To establish a safe platform for adoptive immunotherapy, we first optimized the vector backbone for RNA in vitro transcription to achieve high-level transgene expression. CAR expression and function of RNA-electroporated T cells could be detected up to a week after electroporation. Multiple injections of RNA CAR-electroporated T cells mediated regression of large vascularized flank mesothelioma tumors in NOD/scid/γc(-/-) mice. Dramatic tumor reduction also occurred when the preexisting intraperitoneal human-derived tumors, which had been growing in vivo for >50 days, were treated by multiple injections of autologous human T cells electroporated with anti-mesothelin CAR mRNA. This is the first report using matched patient tumor and lymphocytes showing that autologous T cells from cancer patients can be engineered to provide an effective therapy for a disseminated tumor in a robust preclinical model. Multiple injections of RNA-engineered T cells are a novel approach for adoptive cell transfer, providing flexible platform for the treatment of cancer that may complement the use of retroviral and lentiviral engineered T cells. This approach may increase the therapeutic index of T cells engineered to express powerful activation domains without the associated safety concerns of integrating viral vectors. Copyright © 2010 AACR.

  11. Multiple injections of electroporated autologous T cells expressing a chimeric antigen receptor mediate regression of human disseminated tumor

    PubMed Central

    Zhao, Yangbing; Moon, Edmund; Carpenito, Carmine; Paulos, Chrystal M.; Liu, Xiaojun; Brennan, Andrea L; Chew, Anne; Carroll, Richard G.; Scholler, John; Levine, Bruce L.; Albelda, Steven M.; June, Carl H.

    2010-01-01

    Redirecting T lymphocyte antigen specificity by gene transfer can provide large numbers of tumor reactive T lymphocytes for adoptive immunotherapy. However, safety concerns associated with viral vector production have limited clinical application of T cells expressing chimeric antigen receptors (CARs). T lymphocytes can be gene modified by RNA electroporation without integration-associated safety concerns. To establish a safe platform for adoptive immunotherapy, we first optimized the vector backbone for RNA in vitro transcription to achieve high level transgene expression. CAR expression and function of RNA-electroporated T cells could be detected up to a week post electroporation. Multiple injections of RNA CAR electroporated T cells mediated regression of large vascularized flank mesothelioma tumors in NOD/scid/γc(−/−) mice. Dramatic tumor reduction also occurred when the pre-existing intraperitoneal human-derived tumors, that had been growing in vivo for over 50 days, were treated by multiple injections of autologous human T cells electroporated with anti-mesothelin CAR mRNA. This is the first report using matched patient tumor and lymphocytes demonstrating that autologous T cells from cancer patients can be engineered to provide an effective therapy for a disseminated tumor in a robust preclinical model. Multiple injections of RNA engineered T cells are a novel approach for adoptive cell transfer, providing flexible platform for the treatment of cancer that may complement the use of retroviral and lentiviral engineered T cells. This approach may increase the therapeutic index of T cells engineered to express powerful activation domains without the associated safety concerns of integrating viral vectors. PMID:20926399

  12. Hierarchical Multiple Regression in Counseling Research: Common Problems and Possible Remedies.

    ERIC Educational Resources Information Center

    Petrocelli, John V.

    2003-01-01

    A brief content analysis was conducted on the use of hierarchical regression in counseling research published in the "Journal of Counseling Psychology" and the "Journal of Counseling & Development" during the years 1997-2001. Common problems are cited and possible remedies are described. (Contains 43 references and 3 tables.) (Author)

  13. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  14. Multiple Logistic Regression Analysis of Cigarette Use among High School Students

    ERIC Educational Resources Information Center

    Adwere-Boamah, Joseph

    2011-01-01

    A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…

  15. Multiple regression and inverse moments improve the characterization of the spatial scaling behavior of daily streamflows in the Southeast United States

    USGS Publications Warehouse

    Farmer, William H.; Over, Thomas M.; Vogel, Richard M.

    2015-01-01

    Understanding the spatial structure of daily streamflow is essential for managing freshwater resources, especially in poorly-gaged regions. Spatial scaling assumptions are common in flood frequency prediction (e.g., index-flood method) and the prediction of continuous streamflow at ungaged sites (e.g. drainage-area ratio), with simple scaling by drainage area being the most common assumption. In this study, scaling analyses of daily streamflow from 173 streamgages in the southeastern US resulted in three important findings. First, the use of only positive integer moment orders, as has been done in most previous studies, captures only the probabilistic and spatial scaling behavior of flows above an exceedance probability near the median; negative moment orders (inverse moments) are needed for lower streamflows. Second, assessing scaling by using drainage area alone is shown to result in a high degree of omitted-variable bias, masking the true spatial scaling behavior. Multiple regression is shown to mitigate this bias, controlling for regional heterogeneity of basin attributes, especially those correlated with drainage area. Previous univariate scaling analyses have neglected the scaling of low-flow events and may have produced biased estimates of the spatial scaling exponent. Third, the multiple regression results show that mean flows scale with an exponent of one, low flows scale with spatial scaling exponents greater than one, and high flows scale with exponents less than one. The relationship between scaling exponents and exceedance probabilities may be a fundamental signature of regional streamflow. This signature may improve our understanding of the physical processes generating streamflow at different exceedance probabilities. 

  16. Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment

    NASA Astrophysics Data System (ADS)

    Sahoo, Sasmita; Jha, Madan K.

    2013-12-01

    The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.

  17. Children with Multiple Disabilities and Minimal Motor Behavior Using Chin Movements to Operate Microswitches to Obtain Environmental Stimulation

    ERIC Educational Resources Information Center

    Lancioni, Giulio E.; O'Reilly, Mark F.; Singh, Nirbhay N.; Sigafoos, Jeff; Tota, Alessia; Antonucci, Massimo; Oliva, Doretta

    2006-01-01

    In these two studies, two children with multiple disabilities and minimal motor behavior were assessed to see if they could use chin movements to operate microswitches to obtain environmental stimulation. In Study I, we applied an adapted version of a recently introduced electronic microswitch [Lancioni, G. E., O'Reilly, M. F., Singh, N. N.,…

  18. Nonelectrophoretic bidirectional transfer of a single SDS-PAGE gel with multiple antigens to obtain 12 immunoblots.

    PubMed

    Kurien, Biji T; Scofield, R Hal

    2009-01-01

    Protein blotting is an invaluable technique in immunology to detect and characterize proteins of low abundance. Proteins resolved on sodium dodecyl sulfate (SDS) polyacrylamide gels are normally transferred electrophoretically to adsorbent membranes such as nitrocellulose or polyvinylidene diflouride membranes. Here, we describe the nonelectrophroretic transfer of the Ro 60 (or SSA) autoantigen, 220- and 240-kD spectrin antigens, and prestained molecular weight standards from SDS polyacrylamide gels to obtain up to 12 immunoblots from a single gel and multiple sera.

  19. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  20. Differentiating regressed melanoma from regressed lichenoid keratosis.

    PubMed

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Extrinsic local regression on manifold-valued data

    PubMed Central

    Lin, Lizhen; St Thomas, Brian; Zhu, Hongtu; Dunson, David B.

    2017-01-01

    We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach. PMID:29225385

  2. Interaction Models for Functional Regression.

    PubMed

    Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab

    2016-02-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.

  3. Assessment of triglyceride and cholesterol in overweight people based on multiple linear regression and artificial intelligence model.

    PubMed

    Ma, Jing; Yu, Jiong; Hao, Guangshu; Wang, Dan; Sun, Yanni; Lu, Jianxin; Cao, Hongcui; Lin, Feiyan

    2017-02-20

    The prevalence of high hyperlipemia is increasing around the world. Our aims are to analyze the relationship of triglyceride (TG) and cholesterol (TC) with indexes of liver function and kidney function, and to develop a prediction model of TG, TC in overweight people. A total of 302 adult healthy subjects and 273 overweight subjects were enrolled in this study. The levels of fasting indexes of TG (fs-TG), TC (fs-TC), blood glucose, liver function, and kidney function were measured and analyzed by correlation analysis and multiple linear regression (MRL). The back propagation artificial neural network (BP-ANN) was applied to develop prediction models of fs-TG and fs-TC. The results showed there was significant difference in biochemical indexes between healthy people and overweight people. The correlation analysis showed fs-TG was related to weight, height, blood glucose, and indexes of liver and kidney function; while fs-TC was correlated with age, indexes of liver function (P < 0.01). The MRL analysis indicated regression equations of fs-TG and fs-TC both had statistic significant (P < 0.01) when included independent indexes. The BP-ANN model of fs-TG reached training goal at 59 epoch, while fs-TC model achieved high prediction accuracy after training 1000 epoch. In conclusions, there was high relationship of fs-TG and fs-TC with weight, height, age, blood glucose, indexes of liver function and kidney function. Based on related variables, the indexes of fs-TG and fs-TC can be predicted by BP-ANN models in overweight people.

  4. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  5. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    dos Santos, T. S.; Mendes, D.; Torres, R. R.

    2015-08-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

  6. Obtaining appropriate interval estimates for age when multiple indicators are used: evaluation of an ad-hoc procedure.

    PubMed

    Fieuws, Steffen; Willems, Guy; Larsen-Tangmose, Sara; Lynnerup, Niels; Boldsen, Jesper; Thevissen, Patrick

    2016-03-01

    When an estimate of age is needed, typically multiple indicators are present as found in skeletal or dental information. There exists a vast literature on approaches to estimate age from such multivariate data. Application of Bayes' rule has been proposed to overcome drawbacks of classical regression models but becomes less trivial as soon as the number of indicators increases. Each of the age indicators can lead to a different point estimate ("the most plausible value for age") and a prediction interval ("the range of possible values"). The major challenge in the combination of multiple indicators is not the calculation of a combined point estimate for age but the construction of an appropriate prediction interval. Ignoring the correlation between the age indicators results in intervals being too small. Boldsen et al. (2002) presented an ad-hoc procedure to construct an approximate confidence interval without the need to model the multivariate correlation structure between the indicators. The aim of the present paper is to bring under attention this pragmatic approach and to evaluate its performance in a practical setting. This is all the more needed since recent publications ignore the need for interval estimation. To illustrate and evaluate the method, Köhler et al. (1995) third molar scores are used to estimate the age in a dataset of 3200 male subjects in the juvenile age range.

  7. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    NASA Astrophysics Data System (ADS)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  8. Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

    PubMed

    Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

    2018-02-12

    Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.

  9. Risk factors for autistic regression: results of an ambispective cohort study.

    PubMed

    Zhang, Ying; Xu, Qiong; Liu, Jing; Li, She-chang; Xu, Xiu

    2012-08-01

    A subgroup of children diagnosed with autism experience developmental regression featured by a loss of previously acquired abilities. The pathogeny of autistic regression is unknown, although many risk factors likely exist. To better characterize autistic regression and investigate the association between autistic regression and potential influencing factors in Chinese autistic children, we conducted an ambispective study with a cohort of 170 autistic subjects. Analyses by multiple logistic regression showed significant correlations between autistic regression and febrile seizures (OR = 3.53, 95% CI = 1.17-10.65, P = .025), as well as with a family history of neuropsychiatric disorders (OR = 3.62, 95% CI = 1.35-9.71, P = .011). This study suggests that febrile seizures and family history of neuropsychiatric disorders are correlated with autistic regression.

  10. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  11. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    NASA Astrophysics Data System (ADS)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  12. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  13. Analytic model for the long-term evolution of circular Earth satellite orbits including lunar node regression

    NASA Astrophysics Data System (ADS)

    Zhu, Ting-Lei; Zhao, Chang-Yin; Zhang, Ming-Jiang

    2017-04-01

    This paper aims to obtain an analytic approximation to the evolution of circular orbits governed by the Earth's J2 and the luni-solar gravitational perturbations. Assuming that the lunar orbital plane coincides with the ecliptic plane, Allan and Cook (Proc. R. Soc. A, Math. Phys. Eng. Sci. 280(1380):97, 1964) derived an analytic solution to the orbital plane evolution of circular orbits. Using their result as an intermediate solution, we establish an approximate analytic model with lunar orbital inclination and its node regression be taken into account. Finally, an approximate analytic expression is derived, which is accurate compared to the numerical results except for the resonant cases when the period of the reference orbit approximately equals the integer multiples (especially 1 or 2 times) of lunar node regression period.

  14. Background stratified Poisson regression analysis of cohort data.

    PubMed

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  15. Classification of independent components of EEG into multiple artifact classes.

    PubMed

    Frølich, Laura; Andersen, Tobias S; Mørup, Morten

    2015-01-01

    In this study, we aim to automatically identify multiple artifact types in EEG. We used multinomial regression to classify independent components of EEG data, selecting from 65 spatial, spectral, and temporal features of independent components using forward selection. The classifier identified neural and five nonneural types of components. Between subjects within studies, high classification performances were obtained. Between studies, however, classification was more difficult. For neural versus nonneural classifications, performance was on par with previous results obtained by others. We found that automatic separation of multiple artifact classes is possible with a small feature set. Our method can reduce manual workload and allow for the selective removal of artifact classes. Identifying artifacts during EEG recording may be used to instruct subjects to refrain from activity causing them. Copyright © 2014 Society for Psychophysiological Research.

  16. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  17. Parts-Per-Billion Mass Measurement Accuracy Achieved through the Combination of Multiple Linear Regression and Automatic Gain Control in a Fourier Transform Ion Cyclotron Resonance Mass Spectrometer

    PubMed Central

    Williams, D. Keith; Muddiman, David C.

    2008-01-01

    Fourier transform ion cyclotron resonance mass spectrometry has the ability to achieve unprecedented mass measurement accuracy (MMA); MMA is one of the most significant attributes of mass spectrometric measurements as it affords extraordinary molecular specificity. However, due to space-charge effects, the achievable MMA significantly depends on the total number of ions trapped in the ICR cell for a particular measurement. Even through the use of automatic gain control (AGC), the total ion population is not constant between spectra. Multiple linear regression calibration in conjunction with AGC is utilized in these experiments to formally account for the differences in total ion population in the ICR cell between the external calibration spectra and experimental spectra. This ability allows for the extension of dynamic range of the instrument while allowing mean MMA values to remain less than 1 ppm. In addition, multiple linear regression calibration is used to account for both differences in total ion population in the ICR cell as well as relative ion abundance of a given species, which also affords mean MMA values at the parts-per-billion level. PMID:17539605

  18. Estimation of perceptible water vapor of atmosphere using artificial neural network, support vector machine and multiple linear regression algorithm and their comparative study

    NASA Astrophysics Data System (ADS)

    Shastri, Niket; Pathak, Kamlesh

    2018-05-01

    The water vapor content in atmosphere plays very important role in climate. In this paper the application of GPS signal in meteorology is discussed, which is useful technique that is used to estimate the perceptible water vapor of atmosphere. In this paper various algorithms like artificial neural network, support vector machine and multiple linear regression are use to predict perceptible water vapor. The comparative studies in terms of root mean square error and mean absolute errors are also carried out for all the algorithms.

  19. Engine With Regression and Neural Network Approximators Designed

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2001-01-01

    At the NASA Glenn Research Center, the NASA engine performance program (NEPP, ref. 1) and the design optimization testbed COMETBOARDS (ref. 2) with regression and neural network analysis-approximators have been coupled to obtain a preliminary engine design methodology. The solution to a high-bypass-ratio subsonic waverotor-topped turbofan engine, which is shown in the preceding figure, was obtained by the simulation depicted in the following figure. This engine is made of 16 components mounted on two shafts with 21 flow stations. The engine is designed for a flight envelope with 47 operating points. The design optimization utilized both neural network and regression approximations, along with the cascade strategy (ref. 3). The cascade used three algorithms in sequence: the method of feasible directions, the sequence of unconstrained minimizations technique, and sequential quadratic programming. The normalized optimum thrusts obtained by the three methods are shown in the following figure: the cascade algorithm with regression approximation is represented by a triangle, a circle is shown for the neural network solution, and a solid line indicates original NEPP results. The solutions obtained from both approximate methods lie within one standard deviation of the benchmark solution for each operating point. The simulation improved the maximum thrust by 5 percent. The performance of the linear regression and neural network methods as alternate engine analyzers was found to be satisfactory for the analysis and operation optimization of air-breathing propulsion engines (ref. 4).

  20. Decreasing Multicollinearity: A Method for Models with Multiplicative Functions.

    ERIC Educational Resources Information Center

    Smith, Kent W.; Sasaki, M. S.

    1979-01-01

    A method is proposed for overcoming the problem of multicollinearity in multiple regression equations where multiplicative independent terms are entered. The method is not a ridge regression solution. (JKS)

  1. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOEpatents

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  2. A simplified calculation procedure for mass isotopomer distribution analysis (MIDA) based on multiple linear regression.

    PubMed

    Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio

    2016-10-01

    We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two 13 C atoms ( 13 C 2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of 13 C 2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% 13 C 2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  3. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  4. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    PubMed

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi

  5. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  6. Simple linear and multivariate regression models.

    PubMed

    Rodríguez del Águila, M M; Benítez-Parejo, N

    2011-01-01

    In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.

  7. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes.

    PubMed

    Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán

    2017-04-24

    We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.

  8. Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis

    NASA Astrophysics Data System (ADS)

    Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei

    2016-10-01

    Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.

  9. Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

    NASA Astrophysics Data System (ADS)

    Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

    2014-12-01

    Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.

  10. A Numerical Method for Obtaining Monoenergetic Neutron Flux Distributions and Transmissions in Multiple-Region Slabs

    NASA Technical Reports Server (NTRS)

    Schneider, Harold

    1959-01-01

    This method is investigated for semi-infinite multiple-slab configurations of arbitrary width, composition, and source distribution. Isotropic scattering in the laboratory system is assumed. Isotropic scattering implies that the fraction of neutrons scattered in the i(sup th) volume element or subregion that will make their next collision in the j(sup th) volume element or subregion is the same for all collisions. These so-called "transfer probabilities" between subregions are calculated and used to obtain successive-collision densities from which the flux and transmission probabilities directly follow. For a thick slab with little or no absorption, a successive-collisions technique proves impractical because an unreasonably large number of collisions must be followed in order to obtain the flux. Here the appropriate integral equation is converted into a set of linear simultaneous algebraic equations that are solved for the average total flux in each subregion. When ordinary diffusion theory applies with satisfactory precision in a portion of the multiple-slab configuration, the problem is solved by ordinary diffusion theory, but the flux is plotted only in the region of validity. The angular distribution of neutrons entering the remaining portion is determined from the known diffusion flux and the remaining region is solved by higher order theory. Several procedures for applying the numerical method are presented and discussed. To illustrate the calculational procedure, a symmetrical slab ia vacuum is worked by the numerical, Monte Carlo, and P(sub 3) spherical harmonics methods. In addition, an unsymmetrical double-slab problem is solved by the numerical and Monte Carlo methods. The numerical approach proved faster and more accurate in these examples. Adaptation of the method to anisotropic scattering in slabs is indicated, although no example is included in this paper.

  11. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data.

    PubMed

    Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva

    2017-03-01

    In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

  12. Biases and Standard Errors of Standardized Regression Coefficients

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Chan, Wai

    2011-01-01

    The paper obtains consistent standard errors (SE) and biases of order O(1/n) for the sample standardized regression coefficients with both random and given predictors. Analytical results indicate that the formulas for SEs given in popular text books are consistent only when the population value of the regression coefficient is zero. The sample…

  13. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    ERIC Educational Resources Information Center

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  14. Regression Analysis with Dummy Variables: Use and Interpretation.

    ERIC Educational Resources Information Center

    Hinkle, Dennis E.; Oliver, J. Dale

    1986-01-01

    Multiple regression analysis (MRA) may be used when both continuous and categorical variables are included as independent research variables. The use of MRA with categorical variables involves dummy coding, that is, assigning zeros and ones to levels of categorical variables. Caution is urged in results interpretation. (Author/CH)

  15. Modified Regression Correlation Coefficient for Poisson Regression Model

    NASA Astrophysics Data System (ADS)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  16. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    NASA Astrophysics Data System (ADS)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  17. Deep ensemble learning of sparse regression models for brain disease diagnosis

    PubMed Central

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2018-01-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394

  18. A regression-based 3-D shoulder rhythm.

    PubMed

    Xu, Xu; Lin, Jia-hua; McGorry, Raymond W

    2014-03-21

    In biomechanical modeling of the shoulder, it is important to know the orientation of each bone in the shoulder girdle when estimating the loads on each musculoskeletal element. However, because of the soft tissue overlying the bones, it is difficult to accurately derive the orientation of the clavicle and scapula using surface markers during dynamic movement. The purpose of this study is to develop two regression models which predict the orientation of the clavicle and the scapula. The first regression model uses humerus orientation and individual factors such as age, gender, and anthropometry data as the predictors. The second regression model includes only the humerus orientation as the predictor. Thirty-eight participants performed 118 static postures covering the volume of the right hand reach. The orientation of the thorax, clavicle, scapula and humerus were measured with a motion tracking system. Regression analysis was performed on the Euler angles decomposed from the orientation of each bone from 26 randomly selected participants. The regression models were then validated with the remaining 12 participants. The results indicate that for the first model, the r(2) of the predicted orientation of the clavicle and the scapula ranged between 0.31 and 0.65, and the RMSE obtained from the validation dataset ranged from 6.92° to 10.39°. For the second model, the r(2) ranged between 0.19 and 0.57, and the RMSE obtained from the validation dataset ranged from 6.62° and 11.13°. The derived regression-based shoulder rhythm could be useful in future biomechanical modeling of the shoulder. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  20. TWSVR: Regression via Twin Support Vector Machine.

    PubMed

    Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh

    2016-02-01

    Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. A New Sample Size Formula for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…

  2. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    PubMed

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  3. Using multiple calibration sets to improve the quantitative accuracy of partial least squares (PLS) regression on open-path fourier transform infrared (OP/FT-IR) spectra of ammonia over wide concentration ranges

    USDA-ARS?s Scientific Manuscript database

    A technique of using multiple calibration sets in partial least squares regression (PLS) was proposed to improve the quantitative determination of ammonia from open-path Fourier transform infrared spectra. The spectra were measured near animal farms, and the path-integrated concentration of ammonia...

  4. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  5. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    PubMed

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  6. Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data

    PubMed Central

    DiPrete, Thomas A.; Burik, Casper A. P.; Koellinger, Philipp D.

    2018-01-01

    Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA. PMID:29686100

  7. Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data.

    PubMed

    DiPrete, Thomas A; Burik, Casper A P; Koellinger, Philipp D

    2018-05-29

    Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA. Copyright © 2018 the Author(s). Published by PNAS.

  8. Clifford support vector machines for classification, regression, and recurrence.

    PubMed

    Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy

    2010-11-01

    This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.

  9. Time Series Analysis of Soil Radon Data Using Multiple Linear Regression and Artificial Neural Network in Seismic Precursory Studies

    NASA Astrophysics Data System (ADS)

    Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.

    2017-07-01

    This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.

  10. Examination of Parameters Affecting the House Prices by Multiple Regression Analysis and its Contributions to Earthquake-Based Urban Transformation

    NASA Astrophysics Data System (ADS)

    Denli, H. H.; Durmus, B.

    2016-12-01

    The purpose of this study is to examine the factors which may affect the apartment prices with multiple linear regression analysis models and visualize the results by value maps. The study is focused on a county of Istanbul - Turkey. Totally 390 apartments around the county Umraniye are evaluated due to their physical and locational conditions. The identification of factors affecting the price of apartments in the county with a population of approximately 600k is expected to provide a significant contribution to the apartment market.Physical factors are selected as the age, number of rooms, size, floor numbers of the building and the floor that the apartment is positioned in. Positional factors are selected as the distances to the nearest hospital, school, park and police station. Totally ten physical and locational parameters are examined by regression analysis.After the regression analysis has been performed, value maps are composed from the parameters age, price and price per square meters. The most significant of the composed maps is the price per square meters map. Results show that the location of the apartment has the most influence to the square meter price information of the apartment. A different practice is developed from the composed maps by searching the ability of using price per square meters map in urban transformation practices. By marking the buildings older than 15 years in the price per square meters map, a different and new interpretation has been made to determine the buildings, to which should be given priority during an urban transformation in the county.This county is very close to the North Anatolian Fault zone and is under the threat of earthquakes. By marking the apartments older than 15 years on the price per square meters map, both older and expensive square meters apartments list can be gathered. By the help of this list, the priority could be given to the selected higher valued old apartments to support the economy of the country

  11. A Quantile Regression Approach to Understanding the Relations Between Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    PubMed Central

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2015-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773

  12. A Quantile Regression Approach to Understanding the Relations Among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students.

    PubMed

    Tighe, Elizabeth L; Schatschneider, Christopher

    2016-07-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82%-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. © Hammill Institute on Disabilities 2014.

  13. Sagittal and Vertical Craniofacial Growth Pattern and Timing of Circumpubertal Skeletal Maturation: A Multiple Regression Study

    PubMed Central

    Rosso, Luigi; Riatti, Riccardo

    2016-01-01

    The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males) were included in the study (mean age, 12.3 ± 1.7 years; range, 7.6–16.7 years). These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM) stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles) with age of attainment of the corresponding CVM stage (in months). Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, −0.7). These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases. PMID:27995136

  14. Sagittal and Vertical Craniofacial Growth Pattern and Timing of Circumpubertal Skeletal Maturation: A Multiple Regression Study.

    PubMed

    Perinetti, Giuseppe; Rosso, Luigi; Riatti, Riccardo; Contardo, Luca

    2016-01-01

    The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males) were included in the study (mean age, 12.3 ± 1.7 years; range, 7.6-16.7 years). These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM) stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles) with age of attainment of the corresponding CVM stage (in months). Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, -0.7). These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases.

  15. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Default Bayes Factors for Model Selection in Regression

    ERIC Educational Resources Information Center

    Rouder, Jeffrey N.; Morey, Richard D.

    2012-01-01

    In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…

  17. Regression analysis for LED color detection of visual-MIMO system

    NASA Astrophysics Data System (ADS)

    Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

    2018-04-01

    Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.

  18. A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks.

    PubMed

    Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

    2010-04-21

    The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.

  19. Analyzing hospitalization data: potential limitations of Poisson regression.

    PubMed

    Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R

    2015-08-01

    Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  20. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  1. Geographically weighted regression model on poverty indicator

    NASA Astrophysics Data System (ADS)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  2. Heritability Across the Distribution: An Application of Quantile Regression

    PubMed Central

    Petrill, Stephen A.; Hart, Sara A.; Schatschneider, Christopher; Thompson, Lee A.; Deater-Deckard, Kirby; DeThorne, Laura S.; Bartlett, Christopher

    2016-01-01

    We introduce a new method for analyzing twin data called quantile regression. Through the application presented here, quantile regression is able to assess the genetic and environmental etiology of any skill or ability, at multiple points in the distribution of that skill or ability. This method is compared to the Cherny et al. (Behav Genet 22:153–162, 1992) method in an application to four different reading-related outcomes in 304 pairs of first-grade same sex twins enrolled in the Western Reserve Reading Project. Findings across the two methods were similar; both indicated some variation across the distribution of the genetic and shared environmental influences on non-word reading. However, quantile regression provides more details about the location and size of the measured effect. Applications of the technique are discussed. PMID:21877231

  3. High-Risk Obtainment of Prescription Drugs by Older Adults in New Jersey: The Role of Prescription Opioids.

    PubMed

    Gold, Sarah L; Powell, Kristen Gilmore; Eversman, Michael H; Peterson, N Andrew; Borys, Suzanne; Hallcom, Donald K

    2016-10-01

    To explore the high-risk ways in which older adults obtain prescription opioids and to identify predictors of obtaining prescription opioids from high-risk sources, such as obtaining the same drug from multiple doctors, sharing drugs, and stealing prescription pads. Logistic regression analyses of cross-sectional survey data from the New Jersey Older Adult Survey on Drug Use and Health, a representative random-sample survey. Adults aged 60 and older (N = 725). Items such as obtaining prescriptions for the same drug from more than one doctor and stealing prescription drugs were measured to determine high-risk obtainment of prescription opioids. Almost 15% of the sample used high-risk methods of obtaining prescription opioids. Adults who previously used a prescription opioid recreationally had three times the risk of high-risk obtainment of prescription opioids. These findings illustrate the importance of strengthening prescription drug monitoring programs to reduce high-risk use of prescription drugs in older adults by alerting doctors and pharmacists to potential prescription drug misuse and interactions. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.

  4. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  5. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    PubMed

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  6. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis

    PubMed Central

    Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760

  7. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis.

    PubMed

    Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.

  8. Locally linear regression for pose-invariant face recognition.

    PubMed

    Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen

    2007-07-01

    The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.

  9. Regression: The Apple Does Not Fall Far From the Tree.

    PubMed

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  10. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    NASA Technical Reports Server (NTRS)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  11. Stolon regression

    PubMed Central

    Cherry Vogt, Kimberly S

    2008-01-01

    Many colonial organisms encrust surfaces with feeding and reproductive polyps connected by vascular stolons. Such colonies often show a dichotomy between runner-like forms, with widely spaced polyps and long stolon connections, and sheet-like forms, with closely spaced polyps and short stolon connections. Generative processes, such as rates of polyp initiation relative to rates of stolon elongation, are typically thought to underlie this dichotomy. Regressive processes, such as tissue regression and cell death, may also be relevant. In this context, we have recently characterized the process of stolon regression in a colonial cnidarian, Podocoryna carnea. Stolon regression occurs naturally in these colonies. To characterize this process in detail, high levels of stolon regression were induced in experimental colonies by treatment with reactive oxygen and reactive nitrogen species (ROS and RNS). Either treatment results in stolon regression and is accompanied by high levels of endogenous ROS and RNS as well as morphological indications of cell death in the regressing stolon. The initiating step in regression appears to be a perturbation of normal colony-wide gastrovascular flow. This suggests more general connections between stolon regression and a wide variety of environmental effects. Here we summarize our results and further discuss such connections. PMID:19704785

  12. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  13. Estimating soil temperature using neighboring station data via multi-nonlinear regression and artificial neural network models.

    PubMed

    Bilgili, Mehmet; Sahin, Besir; Sangun, Levent

    2013-01-01

    The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.

  14. Estimating the exceedance probability of rain rate by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.; Kedem, Benjamin

    1990-01-01

    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  15. Multiple Regression Analysis to Estimate the Unit Price of Hanwoo (Bos taurus coreanae) Beef

    PubMed Central

    Hur, Sun-Jin

    2017-01-01

    This study were estimated the contribution of carcass traits to unit price, to analyze the marbling score as a categorical variable rather than a numerical variable, and to develop an optimal model that also includes the holiday effect and the raising period. The data for this study were acquired from the Quality Evaluation of the Korea Institute for Animal Products, and consisted of the trading records of 1,613,699 heads at 12 wholesale markets from 2010 to 2014. The unit price of a cow was estimated from the following parameters: −52.50 Won/mm, 8.93 Won/cm2, 7.20 Won/kg, and −1.04 Won/day for backfat thickness, eye muscle area, carcass weight, and raising period, respectively. Parameters for the dummy variables of marbling scores varied from 0 to 8328.74 Won/kg, which means that each marbling score grade had a different price value. The unit price of a steer was estimated from the following parameters: −92.12 Won/mm, 20.22 Won/cm2, 1.30 Won/kg, and −1.72 Won/day for backfat thickness, eye muscle area, carcass weights, and raising period, respectively. Parameters for dummy variables of marbling scores varied from 0 to 7338.80 Won/kg, which means that the grades of each marbling score had different price values. The unit price of sales during traditional holidays was significantly higher (827.71 Won/kg for cows, and 645.15 Won/kg for steers) than during non-holidays.We conclude that the use of categorical values for marbling scores would be needed to evaluate the price of Hanwoo beef using multiple regression analysis based on carcass traits and environmental factors. PMID:29147089

  16. Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

    NASA Astrophysics Data System (ADS)

    Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

    2018-01-01

    Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.

  17. Regression: A Bibliography.

    ERIC Educational Resources Information Center

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  18. Generalized and synthetic regression estimators for randomized branch sampling

    Treesearch

    David L. R. Affleck; Timothy G. Gregoire

    2015-01-01

    In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...

  19. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    ERIC Educational Resources Information Center

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  20. A regression approach to the mapping of bio-physical characteristics of surface sediment using in situ and airborne hyperspectral acquisitions

    NASA Astrophysics Data System (ADS)

    Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak

    2017-02-01

    Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.

  1. Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

    PubMed

    Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

    2009-11-01

    Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.

  2. Estimating replicate time shifts using Gaussian process regression

    PubMed Central

    Liu, Qiang; Andersen, Bogi; Smyth, Padhraic; Ihler, Alexander

    2010-01-01

    Motivation: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate. Results: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study. Availability: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/ Contact: ihler@ics.uci.edu PMID:20147305

  3. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    PubMed

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  4. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    PubMed Central

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  5. Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

    NASA Astrophysics Data System (ADS)

    Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

    2017-05-01

    This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.

  6. Lifespan development of pro- and anti-saccades: multiple regression models for point estimates.

    PubMed

    Klein, Christoph; Foerster, Friedrich; Hartnegg, Klaus; Fischer, Burkhart

    2005-12-07

    The comparative study of anti- and pro-saccade task performance contributes to our functional understanding of the frontal lobes, their alterations in psychiatric or neurological populations, and their changes during the life span. In the present study, we apply regression analysis to model life span developmental effects on various pro- and anti-saccade task parameters, using data of a non-representative sample of 327 participants aged 9 to 88 years. Development up to the age of about 27 years was dominated by curvilinear rather than linear effects of age. Furthermore, the largest developmental differences were found for intra-subject variability measures and the anti-saccade task parameters. Ageing, by contrast, had the shape of a global linear decline of the investigated saccade functions, lacking the differential effects of age observed during development. While these results do support the assumption that frontal lobe functions can be distinguished from other functions by their strong and protracted development, they do not confirm the assumption of disproportionate deterioration of frontal lobe functions with ageing. We finally show that the regression models applied here to quantify life span developmental effects can also be used for individual predictions in applied research contexts or clinical practice.

  7. Comparing least-squares and quantile regression approaches to analyzing median hospital charges.

    PubMed

    Olsen, Cody S; Clark, Amy E; Thomas, Andrea M; Cook, Lawrence J

    2012-07-01

    Emergency department (ED) and hospital charges obtained from administrative data sets are useful descriptors of injury severity and the burden to EDs and the health care system. However, charges are typically positively skewed due to costly procedures, long hospital stays, and complicated or prolonged treatment for few patients. The median is not affected by extreme observations and is useful in describing and comparing distributions of hospital charges. A least-squares analysis employing a log transformation is one approach for estimating median hospital charges, corresponding confidence intervals (CIs), and differences between groups; however, this method requires certain distributional properties. An alternate method is quantile regression, which allows estimation and inference related to the median without making distributional assumptions. The objective was to compare the log-transformation least-squares method to the quantile regression approach for estimating median hospital charges, differences in median charges between groups, and associated CIs. The authors performed simulations using repeated sampling of observed statewide ED and hospital charges and charges randomly generated from a hypothetical lognormal distribution. The median and 95% CI and the multiplicative difference between the median charges of two groups were estimated using both least-squares and quantile regression methods. Performance of the two methods was evaluated. In contrast to least squares, quantile regression produced estimates that were unbiased and had smaller mean square errors in simulations of observed ED and hospital charges. Both methods performed well in simulations of hypothetical charges that met least-squares method assumptions. When the data did not follow the assumed distribution, least-squares estimates were often biased, and the associated CIs had lower than expected coverage as sample size increased. Quantile regression analyses of hospital charges provide unbiased

  8. Regression Analysis of Stage Variability for West-Central Florida Lakes

    USGS Publications Warehouse

    Sacks, Laura A.; Ellison, Donald L.; Swancar, Amy

    2008-01-01

    The variability in a lake's stage depends upon many factors, including surface-water flows, meteorological conditions, and hydrogeologic characteristics near the lake. An understanding of the factors controlling lake-stage variability for a population of lakes may be helpful to water managers who set regulatory levels for lakes. The goal of this study is to determine whether lake-stage variability can be predicted using multiple linear regression and readily available lake and basin characteristics defined for each lake. Regressions were evaluated for a recent 10-year period (1996-2005) and for a historical 10-year period (1954-63). Ground-water pumping is considered to have affected stage at many of the 98 lakes included in the recent period analysis, and not to have affected stage at the 20 lakes included in the historical period analysis. For the recent period, regression models had coefficients of determination (R2) values ranging from 0.60 to 0.74, and up to five explanatory variables. Standard errors ranged from 21 to 37 percent of the average stage variability. Net leakage was the most important explanatory variable in regressions describing the full range and low range in stage variability for the recent period. The most important explanatory variable in the model predicting the high range in stage variability was the height over median lake stage at which surface-water outflow would occur. Other explanatory variables in final regression models for the recent period included the range in annual rainfall for the period and several variables related to local and regional hydrogeology: (1) ground-water pumping within 1 mile of each lake, (2) the amount of ground-water inflow (by category), (3) the head gradient between the lake and the Upper Floridan aquifer, and (4) the thickness of the intermediate confining unit. Many of the variables in final regression models are related to hydrogeologic characteristics, underscoring the importance of ground

  9. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    PubMed

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  10. An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.

    ERIC Educational Resources Information Center

    Harwell, Michael; Serlin, Ronald C.

    When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…

  11. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    NASA Astrophysics Data System (ADS)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  12. Multiple-trait random regression models for the estimation of genetic parameters for milk, fat, and protein yield in buffaloes.

    PubMed

    Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto

    2013-09-01

    In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. The severity of Minamata disease declined in 25 years: temporal profile of the neurological findings analyzed by multiple logistic regression model.

    PubMed

    Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji

    2005-01-01

    Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.

  14. Alpha-synuclein levels in patients with multiple system atrophy: a meta-analysis.

    PubMed

    Yang, Fei; Li, Wan-Jun; Huang, Xu-Sheng

    2018-05-01

    This study evaluates the relationship between multiple system atrophy and α-synuclein levels in the cerebrospinal fluid, plasma and neural tissue. Literature search for relevant research articles was undertaken in electronic databases and study selection was based on a priori eligibility criteria. Random-effects meta-analyses of standardized mean differences in α-synuclein levels between multiple system atrophy patients and normal controls were conducted to obtain the overall and subgroup effect sizes. Meta-regression analyses were performed to evaluate the effect of age, gender and disease severity on standardized mean differences. Data were obtained from 11 studies involving 378 multiple system atrophy patients and 637 healthy controls (age: multiple system atrophy patients 64.14 [95% confidence interval 62.05, 66.23] years; controls 64.16 [60.06, 68.25] years; disease duration: 44.41 [26.44, 62.38] months). Cerebrospinal fluid α-synuclein levels were significantly lower in multiple system atrophy patients than in controls but in plasma and neural tissue, α-synuclein levels were significantly higher in multiple system atrophy patients (standardized mean difference: -0.99 [-1.65, -0.32]; p = 0.001). Percentage of male multiple system atrophy patients was significantly positively associated with the standardized mean differences of cerebrospinal fluid α-synuclein levels (p = 0.029) whereas the percentage of healthy males was not associated with the standardized mean differences of cerebrospinal fluid α-synuclein levels (p = 0.920). In multiple system atrophy patients, α-synuclein levels were significantly lower in the cerebrospinal fluid and were positively associated with the male gender.

  15. Regression of a vaginal leiomyoma after ovariohysterectomy in a dog: a case report.

    PubMed

    Sathya, Suresh; Linn, Kathleen

    2014-01-01

    An 11 yr old female mixed-breed Siberian husky was presented with a history of sanguineous vaginal discharge, swelling of the perineal area, decreased appetite, and lethargy. A single, large vaginal leiomyoma and multiple mammary tumors were diagnosed. Mastectomy and ovariohysterectomy were performed. The vaginal leiomyoma regressed completely after ovariohysterectomy. This is the first reported case of spontaneous regression of a vaginal leiomyoma after ovariohysterectomy in a dog.

  16. Quantum regression theorem and non-Markovianity of quantum dynamics

    NASA Astrophysics Data System (ADS)

    Guarnieri, Giacomo; Smirne, Andrea; Vacchini, Bassano

    2014-08-01

    We explore the connection between two recently introduced notions of non-Markovian quantum dynamics and the validity of the so-called quantum regression theorem. While non-Markovianity of a quantum dynamics has been defined looking at the behavior in time of the statistical operator, which determines the evolution of mean values, the quantum regression theorem makes statements about the behavior of system correlation functions of order two and higher. The comparison relies on an estimate of the validity of the quantum regression hypothesis, which can be obtained exactly evaluating two-point correlation functions. To this aim we consider a qubit undergoing dephasing due to interaction with a bosonic bath, comparing the exact evaluation of the non-Markovianity measures with the violation of the quantum regression theorem for a class of spectral densities. We further study a photonic dephasing model, recently exploited for the experimental measurement of non-Markovianity. It appears that while a non-Markovian dynamics according to either definition brings with itself violation of the regression hypothesis, even Markovian dynamics can lead to a failure of the regression relation.

  17. Regularized matrix regression

    PubMed Central

    Zhou, Hua; Li, Lexin

    2014-01-01

    Summary Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry and electroencephalography, matrix-type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge on the sparsity of the true signal in terms of the number of its non-zero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. We propose a class of regularized matrix regression methods based on spectral regularization. A highly efficient and scalable estimation algorithm is developed, and a degrees-of-freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the method proposed is demonstrated on both synthetic and real examples. PMID:24648830

  18. Age- and sex-dependent regression models for predicting the live weight of West African Dwarf goat from body measurements.

    PubMed

    Sowande, O S; Oyewale, B F; Iyasere, O S

    2010-06-01

    The relationships between live weight and eight body measurements of West African Dwarf (WAD) goats were studied using 211 animals under farm condition. The animals were categorized based on age and sex. Data obtained on height at withers (HW), heart girth (HG), body length (BL), head length (HL), and length of hindquarter (LHQ) were fitted into simple linear, allometric, and multiple-regression models to predict live weight from the body measurements according to age group and sex. Results showed that live weight, HG, BL, LHQ, HL, and HW increased with the age of the animals. In multiple-regression model, HG and HL best fit the model for goat kids; HG, HW, and HL for goat aged 13-24 months; while HG, LHQ, HW, and HL best fit the model for goats aged 25-36 months. Coefficients of determination (R(2)) values for linear and allometric models for predicting the live weight of WAD goat increased with age in all the body measurements, with HG being the most satisfactory single measurement in predicting the live weight of WAD goat. Sex had significant influence on the model with R(2) values consistently higher in females except the models for LHQ and HW.

  19. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  20. Regression and multivariate models for predicting particulate matter concentration level.

    PubMed

    Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S

    2018-01-01

    The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.

  1. Modeling lactation curves and estimation of genetic parameters in Holstein cows using multiple-trait random regression models.

    PubMed

    Kheirabadi, Khabat; Rashidi, Amir; Alijani, Sadegh; Imumorin, Ikhide

    2014-11-01

    We compared the goodness of fit of three mathematical functions (including: Legendre polynomials, Lidauer-Mäntysaari function and Wilmink function) for describing the lactation curve of primiparous Iranian Holstein cows by using multiple-trait random regression models (MT-RRM). Lactational submodels provided the largest daily additive genetic (AG) and permanent environmental (PE) variance estimates at the end and at the onset of lactation, respectively, as well as low genetic correlations between peripheral test-day records. For all models, heritability estimates were highest at the end of lactation (245 to 305 days) and ranged from 0.05 to 0.26, 0.03 to 0.12 and 0.04 to 0.24 for milk, fat and protein yields, respectively. Generally, the genetic correlations between traits depend on how far apart they are or whether they are on the same day in any two traits. On average, genetic correlations between milk and fat were the lowest and those between fat and protein were intermediate, while those between milk and protein were the highest. Results from all criteria (Akaike's and Schwarz's Bayesian information criterion, and -2*logarithm of the likelihood function) suggested that a model with 2 and 5 coefficients of Legendre polynomials for AG and PE effects, respectively, was the most adequate for fitting the data. © 2014 Japanese Society of Animal Science.

  2. Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

    PubMed Central

    Bondell, Howard D.; Stefanski, Leonard A.

    2013-01-01

    Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

  3. Subsonic Aircraft With Regression and Neural-Network Approximators Designed

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2004-01-01

    At the NASA Glenn Research Center, NASA Langley Research Center's Flight Optimization System (FLOPS) and the design optimization testbed COMETBOARDS with regression and neural-network-analysis approximators have been coupled to obtain a preliminary aircraft design methodology. For a subsonic aircraft, the optimal design, that is the airframe-engine combination, is obtained by the simulation. The aircraft is powered by two high-bypass-ratio engines with a nominal thrust of about 35,000 lbf. It is to carry 150 passengers at a cruise speed of Mach 0.8 over a range of 3000 n mi and to operate on a 6000-ft runway. The aircraft design utilized a neural network and a regression-approximations-based analysis tool, along with a multioptimizer cascade algorithm that uses sequential linear programming, sequential quadratic programming, the method of feasible directions, and then sequential quadratic programming again. Optimal aircraft weight versus the number of design iterations is shown. The central processing unit (CPU) time to solution is given. It is shown that the regression-method-based analyzer exhibited a smoother convergence pattern than the FLOPS code. The optimum weight obtained by the approximation technique and the FLOPS code differed by 1.3 percent. Prediction by the approximation technique exhibited no error for the aircraft wing area and turbine entry temperature, whereas it was within 2 percent for most other parameters. Cascade strategy was required by FLOPS as well as the approximators. The regression method had a tendency to hug the data points, whereas the neural network exhibited a propensity to follow a mean path. The performance of the neural network and regression methods was considered adequate. It was at about the same level for small, standard, and large models with redundancy ratios (defined as the number of input-output pairs to the number of unknown coefficients) of 14, 28, and 57, respectively. In an SGI octane workstation (Silicon Graphics

  4. The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?

    PubMed

    Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping

    2013-01-01

    Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.

  5. Evaluating the performance of different predictor strategies in regression-based downscaling with a focus on glacierized mountain environments

    NASA Astrophysics Data System (ADS)

    Hofer, Marlis; Nemec, Johanna

    2016-04-01

    This study presents first steps towards verifying the hypothesis that uncertainty in global and regional glacier mass simulations can be reduced considerably by reducing the uncertainty in the high-resolution atmospheric input data. To this aim, we systematically explore the potential of different predictor strategies for improving the performance of regression-based downscaling approaches. The investigated local-scale target variables are precipitation, air temperature, wind speed, relative humidity and global radiation, all at a daily time scale. Observations of these target variables are assessed from three sites in geo-environmentally and climatologically very distinct settings, all within highly complex topography and in the close proximity to mountain glaciers: (1) the Vernagtbach station in the Northern European Alps (VERNAGT), (2) the Artesonraju measuring site in the tropical South American Andes (ARTESON), and (3) the Brewster measuring site in the Southern Alps of New Zealand (BREWSTER). As the large-scale predictors, ERA interim reanalysis data are used. In the applied downscaling model training and evaluation procedures, particular emphasis is put on appropriately accounting for the pitfalls of limited and/or patchy observation records that are usually the only (if at all) available data from the glacierized mountain sites. Generalized linear models and beta regression are investigated as alternatives to ordinary least squares regression for the non-Gaussian target variables. By analyzing results for the three different sites, five predictands and for different times of the year, we look for systematic improvements in the downscaling models' skill specifically obtained by (i) using predictor data at the optimum scale rather than the minimum scale of the reanalysis data, (ii) identifying the optimum predictor allocation in the vertical, and (iii) considering multiple (variable, level and/or grid point) predictor options combined with state

  6. [Gaussian process regression and its application in near-infrared spectroscopy analysis].

    PubMed

    Feng, Ai-Ming; Fang, Li-Min; Lin, Min

    2011-06-01

    Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.

  7. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    PubMed

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E

    2014-06-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.

  8. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    PubMed

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  9. A regression technique for evaluation and quantification for water quality parameters from remote sensing data

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H.; Kuo, C. Y.

    1979-01-01

    The objective of this paper is to define optical physics and/or environmental conditions under which the linear multiple-regression should be applicable. An investigation of the signal-response equations is conducted and the concept is tested by application to actual remote sensing data from a laboratory experiment performed under controlled conditions. Investigation of the signal-response equations shows that the exact solution for a number of optical physics conditions is of the same form as a linearized multiple-regression equation, even if nonlinear contributions from surface reflections, atmospheric constituents, or other water pollutants are included. Limitations on achieving this type of solution are defined.

  10. Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

    EPA Science Inventory

    Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...

  11. Regression Analysis of Optical Coherence Tomography Disc Variables for Glaucoma Diagnosis.

    PubMed

    Richter, Grace M; Zhang, Xinbo; Tan, Ou; Francis, Brian A; Chopra, Vikas; Greenfield, David S; Varma, Rohit; Schuman, Joel S; Huang, David

    2016-08-01

    To report diagnostic accuracy of optical coherence tomography (OCT) disc variables using both time-domain (TD) and Fourier-domain (FD) OCT, and to improve the use of OCT disc variable measurements for glaucoma diagnosis through regression analyses that adjust for optic disc size and axial length-based magnification error. Observational, cross-sectional. In total, 180 normal eyes of 112 participants and 180 eyes of 138 participants with perimetric glaucoma from the Advanced Imaging for Glaucoma Study. Diagnostic variables evaluated from TD-OCT and FD-OCT were: disc area, rim area, rim volume, optic nerve head volume, vertical cup-to-disc ratio (CDR), and horizontal CDR. These were compared with overall retinal nerve fiber layer thickness and ganglion cell complex. Regression analyses were performed that corrected for optic disc size and axial length. Area-under-receiver-operating curves (AUROC) were used to assess diagnostic accuracy before and after the adjustments. An index based on multiple logistic regression that combined optic disc variables with axial length was also explored with the aim of improving diagnostic accuracy of disc variables. Comparison of diagnostic accuracy of disc variables, as measured by AUROC. The unadjusted disc variables with the highest diagnostic accuracies were: rim volume for TD-OCT (AUROC=0.864) and vertical CDR (AUROC=0.874) for FD-OCT. Magnification correction significantly worsened diagnostic accuracy for rim variables, and while optic disc size adjustments partially restored diagnostic accuracy, the adjusted AUROCs were still lower. Axial length adjustments to disc variables in the form of multiple logistic regression indices led to a slight but insignificant improvement in diagnostic accuracy. Our various regression approaches were not able to significantly improve disc-based OCT glaucoma diagnosis. However, disc rim area and vertical CDR had very high diagnostic accuracy, and these disc variables can serve to complement

  12. Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…

  13. Quantifying components of the hydrologic cycle in Virginia using chemical hydrograph separation and multiple regression analysis

    USGS Publications Warehouse

    Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.

    2012-01-01

    This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.

  14. Elbow joint angle and elbow movement velocity estimation using NARX-multiple layer perceptron neural network model with surface EMG time domain parameters.

    PubMed

    Raj, Retheep; Sivanandan, K S

    2017-01-01

    Estimation of elbow dynamics has been the object of numerous investigations. In this work a solution is proposed for estimating elbow movement velocity and elbow joint angle from Surface Electromyography (SEMG) signals. Here the Surface Electromyography signals are acquired from the biceps brachii muscle of human hand. Two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), are extracted from the Surface Electromyography signal. The relationship between the time domain parameters, IEMG and ZC with elbow angular displacement and elbow angular velocity during extension and flexion of the elbow are studied. A multiple input-multiple output model is derived for identifying the kinematics of elbow. A Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural network (MLPNN) model is proposed for the estimation of elbow joint angle and elbow angular velocity. The proposed NARX MLPNN model is trained using Levenberg-marquardt based algorithm. The proposed model is estimating the elbow joint angle and elbow movement angular velocity with appreciable accuracy. The model is validated using regression coefficient value (R). The average regression coefficient value (R) obtained for elbow angular displacement prediction is 0.9641 and for the elbow anglular velocity prediction is 0.9347. The Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural networks (MLPNN) model can be used for the estimation of angular displacement and movement angular velocity of the elbow with good accuracy.

  15. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    USGS Publications Warehouse

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  16. Penalized regression procedures for variable selection in the potential outcomes framework

    PubMed Central

    Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

    2015-01-01

    A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185

  17. Multi-Target Regression via Robust Low-Rank Learning.

    PubMed

    Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo

    2018-02-01

    Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

  18. Evaluation of regression-based 3-D shoulder rhythms.

    PubMed

    Xu, Xu; Dickerson, Clark R; Lin, Jia-Hua; McGorry, Raymond W

    2016-08-01

    The movements of the humerus, the clavicle, and the scapula are not completely independent. The coupled pattern of movement of these bones is called the shoulder rhythm. To date, multiple studies have focused on providing regression-based 3-D shoulder rhythms, in which the orientations of the clavicle and the scapula are estimated by the orientation of the humerus. In this study, six existing regression-based shoulder rhythms were evaluated by an independent dataset in terms of their predictability. The datasets include the measured orientations of the humerus, the clavicle, and the scapula of 14 participants over 118 different upper arm postures. The predicted orientations of the clavicle and the scapula were derived from applying those regression-based shoulder rhythms to the humerus orientation. The results indicated that none of those regression-based shoulder rhythms provides consistently more accurate results than the others. For all the joint angles and all the shoulder rhythms, the RMSE are all greater than 5°. Among those shoulder rhythms, the scapula lateral/medial rotation has the strongest correlation between the predicted and the measured angles, while the other thoracoclavicular and thoracoscapular bone orientation angles only showed a weak to moderate correlation. Since the regression-based shoulder rhythm has been adopted for shoulder biomechanical models to estimate shoulder muscle activities and structure loads, there needs to be further investigation on how the predicted error from the shoulder rhythm affects the output of the biomechanical model. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters

    PubMed Central

    Francoeur, Richard B

    2015-01-01

    Background The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Materials and methods Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Results Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain–fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain–fatigue/weakness–sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. Conclusion By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship

  20. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters.

    PubMed

    Francoeur, Richard B

    2015-01-01

    The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain-fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain-fatigue/weakness-sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism

  1. Regression techniques for oceanographic parameter retrieval using space-borne microwave radiometry

    NASA Technical Reports Server (NTRS)

    Hofer, R.; Njoku, E. G.

    1981-01-01

    Variations of conventional multiple regression techniques are applied to the problem of remote sensing of oceanographic parameters from space. The techniques are specifically adapted to the scanning multichannel microwave radiometer (SMRR) launched on the Seasat and Nimbus 7 satellites to determine ocean surface temperature, wind speed, and atmospheric water content. The retrievals are studied primarily from a theoretical viewpoint, to illustrate the retrieval error structure, the relative importances of different radiometer channels, and the tradeoffs between spatial resolution and retrieval accuracy. Comparisons between regressions using simulated and actual SMMR data are discussed; they show similar behavior.

  2. Optimized multiple linear mappings for single image super-resolution

    NASA Astrophysics Data System (ADS)

    Zhang, Kaibing; Li, Jie; Xiong, Zenggang; Liu, Xiuping; Gao, Xinbo

    2017-12-01

    Learning piecewise linear regression has been recognized as an effective way for example learning-based single image super-resolution (SR) in literature. In this paper, we employ an expectation-maximization (EM) algorithm to further improve the SR performance of our previous multiple linear mappings (MLM) based SR method. In the training stage, the proposed method starts with a set of linear regressors obtained by the MLM-based method, and then jointly optimizes the clustering results and the low- and high-resolution subdictionary pairs for regression functions by using the metric of the reconstruction errors. In the test stage, we select the optimal regressor for SR reconstruction by accumulating the reconstruction errors of m-nearest neighbors in the training set. Thorough experimental results carried on six publicly available datasets demonstrate that the proposed SR method can yield high-quality images with finer details and sharper edges in terms of both quantitative and perceptual image quality assessments.

  3. Forecasting Air Force Logistics Command Second Destination Transportation: An Application of Multiple Regression Analysis and Neural Networks

    DTIC Science & Technology

    1990-09-01

    without the help from the DSXR staff. William Lyons, Charles Ramsey , and Martin Meeks went above and beyond to help complete this research. Special...develop a valid forecasting model that is significantly more accurate than the one presently used by DSXR and suggested the development and testing of a...method, Strom tested DSXR’s iterative linear regression forecasting technique by examining P1 in the simple regression equation to determine whether

  4. Quantile regression models of animal habitat relationships

    USGS Publications Warehouse

    Cade, Brian S.

    2003-01-01

    Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative

  5. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  6. How Many Subjects Does It Take to Do a Regression Analysis?

    ERIC Educational Resources Information Center

    Green, Samuel B.

    1991-01-01

    An evaluation of the rules-of-thumb used to determine the minimum number of subjects required to conduct multiple regression analyses suggests that researchers who use a rule of thumb rather than power analyses trade simplicity of use for accuracy and specificity of response. Insufficient power is likely to result. (SLD)

  7. Criteria for the use of regression analysis for remote sensing of sediment and pollutants

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H.; Kuo, C. Y.; Lecroy, S. R.

    1982-01-01

    An examination of limitations, requirements, and precision of the linear multiple-regression technique for quantification of marine environmental parameters is conducted. Both environmental and optical physics conditions have been defined for which an exact solution to the signal response equations is of the same form as the multiple regression equation. Various statistical parameters are examined to define a criteria for selection of an unbiased fit when upwelled radiance values contain error and are correlated with each other. Field experimental data are examined to define data smoothing requirements in order to satisfy the criteria of Daniel and Wood (1971). Recommendations are made concerning improved selection of ground-truth locations to maximize variance and to minimize physical errors associated with the remote sensing experiment.

  8. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    PubMed

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  9. REGRESSION MODELS THAT RELATE STREAMS TO WATERSHEDS: COPING WITH NUMEROUS, COLLINEAR PEDICTORS

    EPA Science Inventory

    GIS efforts can produce a very large number of watershed variables (climate, land use/land cover and topography, all defined for multiple areas of influence) that could serve as candidate predictors in a regression model of reach-scale stream features. Invariably, many of these ...

  10. Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression

    NASA Astrophysics Data System (ADS)

    Dineva, S.; Prodanova, K.; Mlachkova, D.

    2013-12-01

    The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.

  11. A new multiple regression model to identify multi-family houses with a high prevalence of sick building symptoms "SBS", within the healthy sustainable house study in Stockholm (3H).

    PubMed

    Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G

    2010-01-01

    The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.

  12. Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning

    PubMed Central

    Kim, Yong-Hyuk; Ha, Ji-Hun; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K. Y.

    2016-01-01

    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999

  13. Study of Rapid-Regression Liquefying Hybrid Rocket Fuels

    NASA Technical Reports Server (NTRS)

    Zilliac, Greg; DeZilwa, Shane; Karabeyoglu, M. Arif; Cantwell, Brian J.; Castellucci, Paul

    2004-01-01

    A report describes experiments directed toward the development of paraffin-based hybrid rocket fuels that burn at regression rates greater than those of conventional hybrid rocket fuels like hydroxyl-terminated butadiene. The basic approach followed in this development is to use materials such that a hydrodynamically unstable liquid layer forms on the melting surface of a burning fuel body. Entrainment of droplets from the liquid/gas interface can substantially increase the rate of fuel mass transfer, leading to surface regression faster than can be achieved using conventional fuels. The higher regression rate eliminates the need for the complex multi-port grain structures of conventional solid rocket fuels, making it possible to obtain acceptable performance from single-port structures. The high-regression-rate fuels contain no toxic or otherwise hazardous components and can be shipped commercially as non-hazardous commodities. Among the experiments performed on these fuels were scale-up tests using gaseous oxygen. The data from these tests were found to agree with data from small-scale, low-pressure and low-mass-flux laboratory tests and to confirm the expectation that these fuels would burn at high regression rates, chamber pressures, and mass fluxes representative of full-scale rocket motors.

  14. Use of probabilistic weights to enhance linear regression myoelectric control

    NASA Astrophysics Data System (ADS)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  15. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    PubMed

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  16. Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Camilleri, Liberato; Cefai, Carmel

    2013-01-01

    Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…

  17. Aspects of porosity prediction using multivariate linear regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Byrnes, A.P.; Wilson, M.D.

    1991-03-01

    Highly accurate multiple linear regression models have been developed for sandstones of diverse compositions. Porosity reduction or enhancement processes are controlled by the fundamental variables, Pressure (P), Temperature (T), Time (t), and Composition (X), where composition includes mineralogy, size, sorting, fluid composition, etc. The multiple linear regression equation, of which all linear porosity prediction models are subsets, takes the generalized form: Porosity = C{sub 0} + C{sub 1}(P) + C{sub 2}(T) + C{sub 3}(X) + C{sub 4}(t) + C{sub 5}(PT) + C{sub 6}(PX) + C{sub 7}(Pt) + C{sub 8}(TX) + C{sub 9}(Tt) + C{sub 10}(Xt) + C{sub 11}(PTX) + C{submore » 12}(PXt) + C{sub 13}(PTt) + C{sub 14}(TXt) + C{sub 15}(PTXt). The first four primary variables are often interactive, thus requiring terms involving two or more primary variables (the form shown implies interaction and not necessarily multiplication). The final terms used may also involve simple mathematic transforms such as log X, e{sup T}, X{sup 2}, or more complex transformations such as the Time-Temperature Index (TTI). The X term in the equation above represents a suite of compositional variable and, therefore, a fully expanded equation may include a series of terms incorporating these variables. Numerous published bivariate porosity prediction models involving P (or depth) or Tt (TTI) are effective to a degree, largely because of the high degree of colinearity between p and TTI. However, all such bivariate models ignore the unique contributions of P and Tt, as well as various X terms. These simpler models become poor predictors in regions where colinear relations change, were important variables have been ignored, or where the database does not include a sufficient range or weight distribution for the critical variables.« less

  18. Does Bootstrap Procedure Provide Biased Estimates? An Empirical Examination for a Case of Multiple Regression.

    ERIC Educational Resources Information Center

    Fan, Xitao

    This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample…

  19. Robust Multiple Linear Regression.

    DTIC Science & Technology

    1982-12-01

    difficulty, but it might have more solutions corresponding to local minima. Influence Function of M-Estimates The influence function describes the effect...distributionn n function. In case of M-Estimates the influence function was found to be pro- portional to and given as T(X F)) " C(xpF,T) = .(X.T(F) F(dx...where the inverse of any distribution function F is defined in the usual way as F- (s) = inf{x IF(x) > s) 0<sə Influence Function of L-Estimates In a

  20. Feature Selection for Ridge Regression with Provable Guarantees.

    PubMed

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  1. Reliability of the Load-Velocity Relationship Obtained Through Linear and Polynomial Regression Models to Predict the One-Repetition Maximum Load.

    PubMed

    Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador

    2017-12-18

    This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.

  2. Artificial neural networks environmental forecasting in comparison with multiple linear regression technique: From heavy metals to organic micropollutants screening in agricultural soils

    NASA Astrophysics Data System (ADS)

    Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea

    2016-12-01

    The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.

  3. Cytologic regression in women with atypical squamous cells of unknown significance and negative human papillomavirus test.

    PubMed

    Wang, Shu; Lang, Jing He; Cheng, Xue Mei

    2009-12-01

    The aim of this study was to investigate the cytologic regression in women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus test. The 45 women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus at baseline were analyzed about the cytologic regression during 2 years of follow-up. The cumulative rate of cytologic regression was calculated by Kaplan-Meier curves. Of 45 women, the cumulative rates were as follows: 55.6% obtained cytologic regression before 6 months, 84.4% by 1 year, and 95.6% at 2 years. Cytologic regression was not influenced by age, menopausal status, and baseline human papillomavirus load. However, the 1-year cumulative regression rate in women with previous cervical lesions was significantly lower than those without (P=.02), even much lower in women with high-grade intraepithelial neoplasia or worse (P=.008). Most women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus could obtain cytologic regression within 2 years. Women with antecedent cervical lesions need longer time to reach this regression.

  4. Testing homogeneity in Weibull-regression models.

    PubMed

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  5. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.

  6. Learning accurate and interpretable models based on regularized random forests regression

    PubMed Central

    2014-01-01

    Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120

  7. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  8. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis.

    PubMed

    Tvete, Ingunn Fride; Natvig, Bent; Gåsemyr, Jørund; Meland, Nils; Røine, Marianne; Klemp, Marianne

    2015-01-01

    Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs) and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores). The ranking of the drugs when given without DMARD was certolizumab (ranked highest), etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest), tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment) and adalimumab/ etanercept (combined with DMARD treatment) the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.

  9. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis

    PubMed Central

    Tvete, Ingunn Fride; Natvig, Bent; Gåsemyr, Jørund; Meland, Nils; Røine, Marianne; Klemp, Marianne

    2015-01-01

    Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs) and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores). The ranking of the drugs when given without DMARD was certolizumab (ranked highest), etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest), tocilizumab, anakinra, rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment) and adalimumab/ etanercept (combined with DMARD treatment) the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs. PMID:26356639

  10. Predicting punching acceleration from selected strength and power variables in elite karate athletes: a multiple regression analysis.

    PubMed

    Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson

    2014-07-01

    This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.

  11. Skeletal height estimation from regression analysis of sternal lengths in a Northwest Indian population of Chandigarh region: a postmortem study.

    PubMed

    Singh, Jagmahender; Pathak, R K; Chavali, Krishnadutt H

    2011-03-20

    Skeletal height estimation from regression analysis of eight sternal lengths in the subjects of Chandigarh zone of Northwest India is the topic of discussion in this study. Analysis of eight sternal lengths (length of manubrium, length of mesosternum, combined length of manubrium and mesosternum, total sternal length and first four intercostals lengths of mesosternum) measured from 252 male and 91 female sternums obtained at postmortems revealed that mean cadaver stature and sternal lengths were more in North Indians and males than the South Indians and females. Except intercostal lengths, all the sternal lengths were positively correlated with stature of the deceased in both sexes (P < 0.001). The multiple regression analysis of sternal lengths was found more useful than the linear regression for stature estimation. Using multivariate regression analysis, the combined length of manubrium and mesosternum in both sexes and the length of manubrium along with 2nd and 3rd intercostal lengths of mesosternum in males were selected as best estimators of stature. Nonetheless, the stature of males can be predicted with SEE of 6.66 (R(2) = 0.16, r = 0.318) from combination of MBL+BL_3+LM+BL_2, and in females from MBL only, it can be estimated with SEE of 6.65 (R(2) = 0.10, r = 0.318), whereas from the multiple regression analysis of pooled data, stature can be known with SEE of 6.97 (R(2) = 0.387, r = 575) from the combination of MBL+LM+BL_2+TSL+BL_3. The R(2) and F-ratio were found to be statistically significant for almost all the variables in both the sexes, except 4th intercostal length in males and 2nd to 4th intercostal lengths in females. The 'major' sternal lengths were more useful than the 'minor' ones for stature estimation The universal regression analysis used by Kanchan et al. [39] when applied to sternal lengths, gave satisfactory estimates of stature for males only but female stature was comparatively better estimated from simple linear regressions. But they

  12. Prediction models for CO2 emission in Malaysia using best subsets regression and multi-linear regression

    NASA Astrophysics Data System (ADS)

    Tan, C. H.; Matjafri, M. Z.; Lim, H. S.

    2015-10-01

    This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.

  13. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  14. Functional mixture regression.

    PubMed

    Yao, Fang; Fu, Yuejiao; Lee, Thomas C M

    2011-04-01

    In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.

  15. Application of Multi-task Lasso Regression in the Parametrization of Stellar Spectra

    NASA Astrophysics Data System (ADS)

    Chang, Li-Na; Zhang, Pei-Ai

    2015-07-01

    The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only can obtain the information about the common features of the different physical parameters, but also can preserve effectively their own peculiar features. Experiments were done based on the ELODIE synthetic spectral data simulated with the stellar atmospheric model, and on the SDSS data released by the American large-scale survey Sloan. The estimation precision of our model is better than those of the methods in the related literature, especially for the estimates of the gravitational acceleration (lg g) and the chemical abundance ([Fe/H]). In the experiments we changed the spectral resolution, and applied the noises with different signal-to-noise ratios (SNRs) to the spectral data, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, it has a strong stability, and can also improve the overall prediction accuracy of the model.

  16. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  17. A note on variance estimation in random effects meta-regression.

    PubMed

    Sidik, Kurex; Jonkman, Jeffrey N

    2005-01-01

    For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.

  18. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    NASA Astrophysics Data System (ADS)

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-03-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.

  19. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    PubMed Central

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-01-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254

  20. Radiologic assessment of third molar tooth and spheno-occipital synchondrosis for age estimation: a multiple regression analysis study.

    PubMed

    Demirturk Kocasarac, Husniye; Sinanoglu, Alper; Noujeim, Marcel; Helvacioglu Yigit, Dilek; Baydemir, Canan

    2016-05-01

    For forensic age estimation, radiographic assessment of third molar mineralization is important between 14 and 21 years which coincides with the legal age in most countries. The spheno-occipital synchondrosis (SOS) is an important growth site during development, and its use for age estimation is beneficial when combined with other markers. In this study, we aimed to develop a regression model to estimate and narrow the age range based on the radiologic assessment of third molar and SOS in a Turkish subpopulation. Panoramic radiographs and cone beam CT scans of 349 subjects (182 males, 167 females) with age between 8 and 25 were evaluated. Four-stage system was used to evaluate the fusion degree of SOS, and Demirjian's eight stages of development for calcification for third molars. The Pearson correlation indicated a strong positive relationship between age and third molar calcification for both sexes (r = 0.850 for females, r = 0.839 for males, P < 0.001) and also between age and SOS fusion for females (r = 0.814), but a moderate relationship was found for males (r = 0.599), P < 0.001). Based on the results obtained, an age determination formula using these scores was established.

  1. ERP correlates of word production predictors in picture naming: a trial by trial multiple regression analysis from stimulus onset to response.

    PubMed

    Valente, Andrea; Bürki, Audrey; Laganaro, Marina

    2014-01-01

    A major effort in cognitive neuroscience of language is to define the temporal and spatial characteristics of the core cognitive processes involved in word production. One approach consists in studying the effects of linguistic and pre-linguistic variables in picture naming tasks. So far, studies have analyzed event-related potentials (ERPs) during word production by examining one or two variables with factorial designs. Here we extended this approach by investigating simultaneously the effects of multiple theoretical relevant predictors in a picture naming task. High density EEG was recorded on 31 participants during overt naming of 100 pictures. ERPs were extracted on a trial by trial basis from picture onset to 100 ms before the onset of articulation. Mixed-effects regression models were conducted to examine which variables affected production latencies and the duration of periods of stable electrophysiological patterns (topographic maps). Results revealed an effect of a pre-linguistic variable, visual complexity, on an early period of stable electric field at scalp, from 140 to 180 ms after picture presentation, a result consistent with the proposal that this time period is associated with visual object recognition processes. Three other variables, word Age of Acquisition, Name Agreement, and Image Agreement influenced response latencies and modulated ERPs from ~380 ms to the end of the analyzed period. These results demonstrate that a topographic analysis fitted into the single trial ERPs and covering the entire processing period allows one to associate the cost generated by psycholinguistic variables to the duration of specific stable electrophysiological processes and to pinpoint the precise time-course of multiple word production predictors at once.

  2. A Semiparametric Change-Point Regression Model for Longitudinal Observations.

    PubMed

    Xing, Haipeng; Ying, Zhiliang

    2012-12-01

    Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.

  3. Association between resting-state brain network topological organization and creative ability: Evidence from a multiple linear regression model.

    PubMed

    Jiao, Bingqing; Zhang, Delong; Liang, Aiying; Liang, Bishan; Wang, Zengjian; Li, Junchao; Cai, Yuxuan; Gao, Mengxia; Gao, Zhenni; Chang, Song; Huang, Ruiwang; Liu, Ming

    2017-10-01

    Previous studies have indicated a tight linkage between resting-state functional connectivity of the human brain and creative ability. This study aimed to further investigate the association between the topological organization of resting-state brain networks and creativity. Therefore, we acquired resting-state fMRI data from 22 high-creativity participants and 22 low-creativity participants (as determined by their Torrance Tests of Creative Thinking scores). We then constructed functional brain networks for each participant and assessed group differences in network topological properties before exploring the relationships between respective network topological properties and creative ability. We identified an optimized organization of intrinsic brain networks in both groups. However, compared with low-creativity participants, high-creativity participants exhibited increased global efficiency and substantially decreased path length, suggesting increased efficiency of information transmission across brain networks in creative individuals. Using a multiple linear regression model, we further demonstrated that regional functional integration properties (i.e., the betweenness centrality and global efficiency) of brain networks, particularly the default mode network (DMN) and sensorimotor network (SMN), significantly predicted the individual differences in creative ability. Furthermore, the associations between network regional properties and creative performance were creativity-level dependent, where the difference in the resource control component may be important in explaining individual difference in creative performance. These findings provide novel insights into the neural substrate of creativity and may facilitate objective identification of creative ability. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. The predictive information obtained by testing multiple software versions

    NASA Technical Reports Server (NTRS)

    Lee, Larry D.

    1987-01-01

    Multiversion programming is a redundancy approach to developing highly reliable software. In applications of this method, two or more versions of a program are developed independently by different programmers and the versions are combined to form a redundant system. One variation of this approach consists of developing a set of n program versions and testing the versions to predict the failure probability of a particular program or a system formed from a subset of the programs. The precision that might be obtained, and also the effect of programmer variability if predictions are made over repetitions of the process of generating different program versions, are examined.

  5. Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

    NASA Astrophysics Data System (ADS)

    Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

    2017-06-01

    Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.

  6. Partial Least Squares Regression Calibration of an Ultraviolet-Visible Spectrophotometer for Measurements of Chemical Oxygen Demand in Dye Wastewater

    NASA Astrophysics Data System (ADS)

    Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.

    2017-11-01

    Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.

  7. Bootstrap investigation of the stability of a Cox regression model.

    PubMed

    Altman, D G; Andersen, P K

    1989-07-01

    We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.

  8. Disturbed Glucose Metabolism in Rat Neurons Exposed to Cerebrospinal Fluid Obtained from Multiple Sclerosis Subjects

    PubMed Central

    Mathur, Deepali; María-Lafuente, Eva; Ureña-Peralta, Juan R.; Sorribes, Lucas; Hernández, Alberto; Casanova, Bonaventura; López-Rodas, Gerardo; Coret-Ferrer, Francisco; Burgal-Marti, Maria

    2017-01-01

    Axonal damage is widely accepted as a major cause of permanent functional disability in Multiple Sclerosis (MS). In relapsing-remitting MS, there is a possibility of remyelination by myelin producing cells and restoration of neurological function. The purpose of this study was to delineate the pathophysiological mechanisms underpinning axonal injury through hitherto unknown factors present in cerebrospinal fluid (CSF) that may regulate axonal damage, remyelinate the axon and make functional recovery possible. We employed primary cultures of rat unmyelinated cerebellar granule neurons and treated them with CSF obtained from MS and Neuromyelitis optica (NMO) patients. We performed microarray gene expression profiling to study changes in gene expression in treated neurons as compared to controls. Additionally, we determined the influence of gene-gene interaction upon the whole metabolic network in our experimental conditions using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) program. Our findings revealed the downregulated expression of genes involved in glucose metabolism in MS-derived CSF-treated neurons and upregulated expression of genes in NMO-derived CSF-treated neurons. We conclude that factors in the CSF of these patients caused a perturbation in metabolic gene(s) expression and suggest that MS appears to be linked with metabolic deformity. PMID:29267205

  9. Quantitative structure-activity relationship of the curcumin-related compounds using various regression methods

    NASA Astrophysics Data System (ADS)

    Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi

    2016-03-01

    Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.

  10. A multiple linear regression analysis of factors affecting the simulated Basic Life Support (BLS) performance with Automated External Defibrillator (AED) in Flemish lifeguards.

    PubMed

    Iserbyt, Peter; Schouppe, Gilles; Charlier, Nathalie

    2015-04-01

    Research investigating lifeguards' performance of Basic Life Support (BLS) with Automated External Defibrillator (AED) is limited. Assessing simulated BLS/AED performance in Flemish lifeguards and identifying factors affecting this performance. Six hundred and sixteen (217 female and 399 male) certified Flemish lifeguards (aged 16-71 years) performed BLS with an AED on a Laerdal ResusciAnne manikin simulating an adult victim of drowning. Stepwise multiple linear regression analysis was conducted with BLS/AED performance as outcome variable and demographic data as explanatory variables. Mean BLS/AED performance for all lifeguards was 66.5%. Compression rate and depth adhered closely to ERC 2010 guidelines. Ventilation volume and flow rate exceeded the guidelines. A significant regression model, F(6, 415)=25.61, p<.001, ES=.38, explained 27% of the variance in BLS performance (R2=.27). Significant predictors were age (beta=-.31, p<.001), years of certification (beta=-.41, p<.001), time on duty per year (beta=-.25, p<.001), practising BLS skills (beta=.11, p=.011), and being a professional lifeguard (beta=-.13, p=.029). 71% of lifeguards reported not practising BLS/AED. Being young, recently certified, few days of employment per year, practising BLS skills and not being a professional lifeguard are factors associated with higher BLS/AED performance. Measures should be taken to prevent BLS/AED performances from decaying with age and longer certification. Refresher courses could include a formal skills test and lifeguards should be encouraged to practise their BLS/AED skills. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  11. Estimating Causal Effects of Education Interventions Using a Two-Rating Regression Discontinuity Design: Lessons from a Simulation Study

    ERIC Educational Resources Information Center

    Porter, Kristin E.; Reardon, Sean F.; Unlu, Fatih; Bloom, Howard S.; Robinson-Cimpian, Joseph P.

    2014-01-01

    A valuable extension of the single-rating regression discontinuity design (RDD) is a multiple-rating RDD (MRRDD). To date, four main methods have been used to estimate average treatment effects at the multiple treatment frontiers of an MRRDD: the "surface" method, the "frontier" method, the "binding-score" method, and…

  12. On the use of regression analysis for the estimation of human biological age.

    PubMed

    Krøll, J; Saxtrup, O

    2000-01-01

    The present investigation compares three linear regression procedures for the definition of human biological age (bioage). As a model system for bioage definition is used the variations with age of blood hemoglobin (B-hemoglobin) in males in the age range 50-95 years. The bioage measures compared are: 1: P-bioage; defined from regression of chronological age on B-hemoglobin results. 2: AC-bioage; obtained by indirect regression, using in reverse the equation describing the regression of B-hemoglobin on age in a reference population. 3: BC-bioage; defined by orthogonal regression on the reference regression line of B-hemoglobin on age. It is demonstrated that the P-bioage measure gives an overestimation of the bioage in the younger and an underestimation in the older individuals. This 'regression to the mean' is avoided using the indirect regression procedures. Here the relatively low SD of the BC-bioage measure results from the inclusion of individual chronological age in the orthogonal regression procedure. Observations on male blood donors illustrates the variation of the AC- and BC-bioage measures in the individual.

  13. Progressive and Regressive Aspects of Information Technology in Society: A Third Sector Perspective

    ERIC Educational Resources Information Center

    Miller, Kandace R.

    2009-01-01

    This dissertation explores the impact of information technology on progressive and regressive values in society from the perspective of one international foundation and four of its technology-related programs. Through a critical interpretive approach employing an instrumental multiple-case method, a framework to help explain the influence of…

  14. Multiple-trait structured antedependence model to study the relationship between litter size and birth weight in pigs and rabbits.

    PubMed

    David, Ingrid; Garreau, Hervé; Balmisse, Elodie; Billon, Yvon; Canario, Laurianne

    2017-01-20

    Some genetic studies need to take into account correlations between traits that are repeatedly measured over time. Multiple-trait random regression models are commonly used to analyze repeated traits but suffer from several major drawbacks. In the present study, we developed a multiple-trait extension of the structured antedependence model (SAD) to overcome this issue and validated its usefulness by modeling the association between litter size (LS) and average birth weight (ABW) over parities in pigs and rabbits. The single-trait SAD model assumes that a random effect at time [Formula: see text] can be explained by the previous values of the random effect (i.e. at previous times). The proposed multiple-trait extension of the SAD model consists in adding a cross-antedependence parameter to the single-trait SAD model. This model can be easily fitted using ASReml and the OWN Fortran program that we have developed. In comparison with the random regression model, we used our multiple-trait SAD model to analyze the LS and ABW of 4345 litters from 1817 Large White sows and 8706 litters from 2286 L-1777 does over a maximum of five successive parities. For both species, the multiple-trait SAD fitted the data better than the random regression model. The difference between AIC of the two models (AIC_random regression-AIC_SAD) were equal to 7 and 227 for pigs and rabbits, respectively. A similar pattern of heritability and correlation estimates was obtained for both species. Heritabilities were lower for LS (ranging from 0.09 to 0.29) than for ABW (ranging from 0.23 to 0.39). The general trend was a decrease of the genetic correlation for a given trait between more distant parities. Estimates of genetic correlations between LS and ABW were negative and ranged from -0.03 to -0.52 across parities. No correlation was observed between the permanent environmental effects, except between the permanent environmental effects of LS and ABW of the same parity, for which the estimate of

  15. Satellite rainfall retrieval by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  16. Variable selection and model choice in geoadditive regression models.

    PubMed

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  17. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  18. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  19. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  20. DYNA3D/ParaDyn Regression Test Suite Inventory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Jerry I.

    2016-09-01

    The following table constitutes an initial assessment of feature coverage across the regression test suite used for DYNA3D and ParaDyn. It documents the regression test suite at the time of preliminary release 16.1 in September 2016. The columns of the table represent groupings of functionalities, e.g., material models. Each problem in the test suite is represented by a row in the table. All features exercised by the problem are denoted by a check mark (√) in the corresponding column. The definition of “feature” has not been subdivided to its smallest unit of user input, e.g., algorithmic parameters specific to amore » particular type of contact surface. This represents a judgment to provide code developers and users a reasonable impression of feature coverage without expanding the width of the table by several multiples. All regression testing is run in parallel, typically with eight processors, except problems involving features only available in serial mode. Many are strictly regression tests acting as a check that the codes continue to produce adequately repeatable results as development unfolds; compilers change and platforms are replaced. A subset of the tests represents true verification problems that have been checked against analytical or other benchmark solutions. Users are welcomed to submit documented problems for inclusion in the test suite, especially if they are heavily exercising, and dependent upon, features that are currently underrepresented.« less

  1. Application of Multiple Regression and Design of Experiments for Modelling the Effect of Monoethylene Glycol in the Calcium Carbonate Scaling Process.

    PubMed

    Kartnaller, Vinicius; Venâncio, Fabrício; F do Rosário, Francisca; Cajaiba, João

    2018-04-10

    To avoid gas hydrate formation during oil and gas production, companies usually employ thermodynamic inhibitors consisting of hydroxyl compounds, such as monoethylene glycol (MEG). However, these inhibitors may cause other types of fouling during production such as inorganic salt deposits (scale). Calcium carbonate is one of the main scaling salts and is a great concern, especially for the new pre-salt wells being explored in Brazil. Hence, it is important to understand how using inhibitors to control gas hydrate formation may be interacting with the scale formation process. Multiple regression and design of experiments were used to mathematically model the calcium carbonate scaling process and its evolution in the presence of MEG. It was seen that MEG, although inducing the precipitation by increasing the supersaturation ratio, actually works as a scale inhibitor for calcium carbonate in concentrations over 40%. This effect was not due to changes in the viscosity, as suggested in the literature, but possibly to the binding of MEG to the CaCO₃ particles' surface. The interaction of the MEG inhibition effect with the system's variables was also assessed, when temperature' and calcium concentration were more relevant.

  2. An empirical study using permutation-based resampling in meta-regression

    PubMed Central

    2012-01-01

    Background In meta-regression, as the number of trials in the analyses decreases, the risk of false positives or false negatives increases. This is partly due to the assumption of normality that may not hold in small samples. Creation of a distribution from the observed trials using permutation methods to calculate P values may allow for less spurious findings. Permutation has not been empirically tested in meta-regression. The objective of this study was to perform an empirical investigation to explore the differences in results for meta-analyses on a small number of trials using standard large sample approaches verses permutation-based methods for meta-regression. Methods We isolated a sample of randomized controlled clinical trials (RCTs) for interventions that have a small number of trials (herbal medicine trials). Trials were then grouped by herbal species and condition and assessed for methodological quality using the Jadad scale, and data were extracted for each outcome. Finally, we performed meta-analyses on the primary outcome of each group of trials and meta-regression for methodological quality subgroups within each meta-analysis. We used large sample methods and permutation methods in our meta-regression modeling. We then compared final models and final P values between methods. Results We collected 110 trials across 5 intervention/outcome pairings and 5 to 10 trials per covariate. When applying large sample methods and permutation-based methods in our backwards stepwise regression the covariates in the final models were identical in all cases. The P values for the covariates in the final model were larger in 78% (7/9) of the cases for permutation and identical for 22% (2/9) of the cases. Conclusions We present empirical evidence that permutation-based resampling may not change final models when using backwards stepwise regression, but may increase P values in meta-regression of multiple covariates for relatively small amount of trials. PMID:22587815

  3. Do clinical data and human papilloma virus genotype influence spontaneous regression in grade I cervical intraepithelial neoplasia?

    PubMed

    Cortés-Alaguero, Caterina; González-Mirasol, Esteban; Morales-Roselló, José; Poblet-Martinez, Enrique

    2017-03-15

    To determine whether medical history, clinical examination and human papilloma virus (HPV) genotype influence spontaneous regression in cervical intraepithelial neoplasia grade I (CIN-I). We retrospectively evaluated 232 women who were histologically diagnosed as have CIN-I by means of Kaplan-Meier curves, the pattern of spontaneous regression according to the medical history, clinical examination, and HPV genotype. Spontaneous regression occurred in most patients and was influenced by the presence of multiple HPV genotypes but not by the HPV genotype itself. In addition, regression frequency was diminished when more than 50% of the cervix surface was affected or when an abnormal cytology was present at the beginning of follow-up. The frequency of regression in CIN-I is high, making long-term follow-up and conservative management advisable. Data from clinical examination and HPV genotyping might help to anticipate which lesions will regress.

  4. The mechanical properties of high speed GTAW weld and factors of nonlinear multiple regression model under external transverse magnetic field

    NASA Astrophysics Data System (ADS)

    Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou

    2013-05-01

    A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.

  5. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    USGS Publications Warehouse

    Kokaly, R.F.; Clark, R.N.

    1999-01-01

    We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using

  6. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  7. Regression Discontinuity Design in Gifted and Talented Education Research

    ERIC Educational Resources Information Center

    Matthews, Michael S.; Peters, Scott J.; Housand, Angela M.

    2012-01-01

    This Methodological Brief introduces the reader to the regression discontinuity design (RDD), which is a method that when used correctly can yield estimates of research treatment effects that are equivalent to those obtained through randomized control trials and can therefore be used to infer causality. However, RDD does not require the random…

  8. Estimating Time to Event From Longitudinal Categorical Data: An Analysis of Multiple Sclerosis Progression.

    PubMed

    Mandel, Micha; Gauthier, Susan A; Guttmann, Charles R G; Weiner, Howard L; Betensky, Rebecca A

    2007-12-01

    The expanded disability status scale (EDSS) is an ordinal score that measures progression in multiple sclerosis (MS). Progression is defined as reaching EDSS of a certain level (absolute progression) or increasing of one point of EDSS (relative progression). Survival methods for time to progression are not adequate for such data since they do not exploit the EDSS level at the end of follow-up. Instead, we suggest a Markov transitional model applicable for repeated categorical or ordinal data. This approach enables derivation of covariate-specific survival curves, obtained after estimation of the regression coefficients and manipulations of the resulting transition matrix. Large sample theory and resampling methods are employed to derive pointwise confidence intervals, which perform well in simulation. Methods for generating survival curves for time to EDSS of a certain level, time to increase of EDSS of at least one point, and time to two consecutive visits with EDSS greater than three are described explicitly. The regression models described are easily implemented using standard software packages. Survival curves are obtained from the regression results using packages that support simple matrix calculation. We present and demonstrate our method on data collected at the Partners MS center in Boston, MA. We apply our approach to progression defined by time to two consecutive visits with EDSS greater than three, and calculate crude (without covariates) and covariate-specific curves.

  9. Combining semiquantitative measures of fibrosis and qualitative features of parenchymal remodelling to identify fibrosis regression in hepatitis C: a multiple biopsy study.

    PubMed

    Pattullo, Venessa; Thein, Hla-Hla; Heathcote, Elizabeth Jenny; Guindi, Maha

    2012-09-01

    A fall in hepatic fibrosis stage may be observed in patients with chronic hepatitis C (CHC); however, parenchymal architectural changes may also signify hepatic remodelling associated with fibrosis regression. The aim of this study was to utilize semiquantitative and qualitative methods to report the prevalence and factors associated with fibrosis regression in CHC. Paired liver biopsies were scored for fibrosis (Ishak), and for the presence of eight qualitative features of parenchymal remodelling, to derive a qualitative regression score (QR score). Combined fibrosis regression was defined as ≥2-stage fall in Ishak stage (Reg-I) or <2-stage fall in Ishak stage with a rise in QR score (Reg-Qual). Among 159 patients (biopsy interval 5.4 ± 3.1 years), Reg-I was observed in 12 (7.5%) and Reg-Qual in 26 (16.4%) patients. The combined diagnostic criteria increased the diagnosis rate for fibrosis regression (38 patients, 23.9%) compared with use of Reg-I alone (P < 0.001). Combined fibrosis regression was observed in nine patients (50%) who achieved sustained virological response (SVR), and in 29 of 141 (21%) patients despite persistent viraemia. SVR was the only clinical factor associated independently with combined fibrosis regression (odds ratio 3.05). The combination of semiquantitative measures and qualitative features aids the identification of fibrosis regression in CHC. © 2012 Blackwell Publishing Ltd.

  10. Fuel Regression Characteristics of Cascaded Multistage Impinging-Jet (CAMUI) Type Hybrid Rocket

    NASA Astrophysics Data System (ADS)

    Itoh, Mitsunori; Maeda, Takenori; Kakikura, Akihito; Kaneko, Yudai; Mori, Kazuhiro; Nakashima, Takuji; Wakita, Masashi; Uematsu, Tsutomu; Totani, Tsuyoshi; Oshima, Nobuyuki; Nagata, Harunori

    A series of lab-scale firing tests was conducted to investigate the fuel regression characteristics of Cascaded Multistage Impinging-jet (CAMUI) type hybrid rocket. The alternative fuel grain used in this rocket consists of a number of cylindrical fuel blocks with two ports, which were aligned along the axis of the combustion chamber with a small gap. The ports are aligned staggered with respect to ones of neighboring blocks so that the combustion gas flow impinges on the forward-end surface of each block. In this fuel grain, forward-end surfaces, back-end surfaces and ports of fuel blocks contribute as burning surfaces. Polyethylene and LOX were used as a propellant, and the tests were conducted at the chamber pressure of 0.5 2MPa and the mass flux of 50 200kg/m2s. Main results obtained in this study are in the followings: The regression rate of each surface was obtained as a function of the propellant mass flux and local equivalent ratio of the combustion gas. At back-end surfaces the regression rate has a high sensitivity on the gap height of neighboring fuel blocks. These fuel regression characteristics will contribute as fundamental data to improve the optimum design of the fuel grain.

  11. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  12. Characterizing Individual Differences in Functional Connectivity Using Dual-Regression and Seed-Based Approaches

    PubMed Central

    Smith, David V.; Utevsky, Amanda V.; Bland, Amy R.; Clement, Nathan; Clithero, John A.; Harsch, Anne E. W.; Carter, R. McKell; Huettel, Scott A.

    2014-01-01

    A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent components analysis (ICA). We estimated voxelwise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust—yet frequently ignored—neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. PMID:24662574

  13. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  14. Does vagotomy protect against multiple sclerosis?

    PubMed

    Sundbøll, Jens; Horváth-Puhó, Erzsébet; Adelborg, Kasper; Svensson, Elisabeth

    2017-07-01

    To examine the association between vagotomy and multiple sclerosis. We conducted a matched cohort study of all patients who underwent truncal or super-selective vagotomy and a comparison cohort, by linking Danish population-based medical registries (1977-1995). Hazard ratios (HRs) for multiple sclerosis, adjusting for potential confounders were computed by means of Cox regression analysis. Median age of multiple sclerosis onset corresponded to late onset multiple sclerosis. No association with multiple sclerosis was observed for truncal vagotomy (0-37 year adjusted HR=0.91, 95% confidence interval [CI]: 0.48-1.74) or super-selective vagotomy (0-37 year adjusted HR=1.28, 95% CI: 0.79-2.09) compared with the general population. We found no association between vagotomy and later risk of late onset multiple sclerosis. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Demonstration of a Fiber Optic Regression Probe

    NASA Technical Reports Server (NTRS)

    Korman, Valentin; Polzin, Kurt A.

    2010-01-01

    empirically anchoring any analysis geared towards lifetime qualification. Erosion rate data over an operating envelope could also be useful in the modeling detailed physical processes. The sensor has been embedded in many regressing media for the purposes of proof-of-concept testing. A gross demonstration of its capabilities was performed using a sanding wheel to remove layers of metal. A longer-term demonstration measurement involved the placement of the sensor in a brake pad, monitoring the removal of pad material associated with the normal wear-and-tear of driving. It was used to measure the regression rates of the combustable media in small model rocket motors and road flares. Finally, a test was performed using a sand blaster to remove small amounts of material at a time. This test was aimed at demonstrating the unit's present resolution, and is compared with laser profilometry data obtained simultaneously. At the lowest resolution levels, this unit should be useful in locally quantifying the erosion rates of the channel walls in plasma thrusters. .

  16. A rotor optimization using regression analysis

    NASA Technical Reports Server (NTRS)

    Giansante, N.

    1984-01-01

    The design and development of helicopter rotors is subject to the many design variables and their interactions that effect rotor operation. Until recently, selection of rotor design variables to achieve specified rotor operational qualities has been a costly, time consuming, repetitive task. For the past several years, Kaman Aerospace Corporation has successfully applied multiple linear regression analysis, coupled with optimization and sensitivity procedures, in the analytical design of rotor systems. It is concluded that approximating equations can be developed rapidly for a multiplicity of objective and constraint functions and optimizations can be performed in a rapid and cost effective manner; the number and/or range of design variables can be increased by expanding the data base and developing approximating functions to reflect the expanded design space; the order of the approximating equations can be expanded easily to improve correlation between analyzer results and the approximating equations; gradients of the approximating equations can be calculated easily and these gradients are smooth functions reducing the risk of numerical problems in the optimization; the use of approximating functions allows the problem to be started easily and rapidly from various initial designs to enhance the probability of finding a global optimum; and the approximating equations are independent of the analysis or optimization codes used.

  17. Sparse brain network using penalized linear regression

    NASA Astrophysics Data System (ADS)

    Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.

    2011-03-01

    Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.

  18. A regularization corrected score method for nonlinear regression models with covariate error.

    PubMed

    Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna

    2013-03-01

    Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.

  19. Reduced rank regression via adaptive nuclear norm penalization

    PubMed Central

    Chen, Kun; Dong, Hongbo; Chan, Kung-Sik

    2014-01-01

    Summary We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy. PMID:25045172

  20. Use of partial least squares regression to impute SNP genotypes in Italian cattle breeds.

    PubMed

    Dimauro, Corrado; Cellesi, Massimo; Gaspa, Giustino; Ajmone-Marsan, Paolo; Steri, Roberto; Marras, Gabriele; Macciotta, Nicolò P P

    2013-06-05

    The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.

  1. NCCS Regression Test Harness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tharrington, Arnold N.

    2015-09-09

    The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.

  2. Factors Affecting Regression-Discontinuity.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.

    The regression-discontinuity approach to evaluating educational programs is reviewed, and regression-discontinuity post-program mean differences under various conditions are discussed. The regression-discontinuity design is used to determine whether post-program differences exist between an experimental program and a control group. The difference…

  3. An Innovative Technique to Assess Spontaneous Baroreflex Sensitivity with Short Data Segments: Multiple Trigonometric Regressive Spectral Analysis.

    PubMed

    Li, Kai; Rüdiger, Heinz; Haase, Rocco; Ziemssen, Tjalf

    2018-01-01

    Objective: As the multiple trigonometric regressive spectral (MTRS) analysis is extraordinary in its ability to analyze short local data segments down to 12 s, we wanted to evaluate the impact of the data segment settings by applying the technique of MTRS analysis for baroreflex sensitivity (BRS) estimation using a standardized data pool. Methods: Spectral and baroreflex analyses were performed on the EuroBaVar dataset (42 recordings, including lying and standing positions). For this analysis, the technique of MTRS was used. We used different global and local data segment lengths, and chose the global data segments from different positions. Three global data segments of 1 and 2 min and three local data segments of 12, 20, and 30 s were used in MTRS analysis for BRS. Results: All the BRS-values calculated on the three global data segments were highly correlated, both in the supine and standing positions; the different global data segments provided similar BRS estimations. When using different local data segments, all the BRS-values were also highly correlated. However, in the supine position, using short local data segments of 12 s overestimated BRS compared with those using 20 and 30 s. In the standing position, the BRS estimations using different local data segments were comparable. There was no proportional bias for the comparisons between different BRS estimations. Conclusion: We demonstrate that BRS estimation by the MTRS technique is stable when using different global data segments, and MTRS is extraordinary in its ability to evaluate BRS in even short local data segments (20 and 30 s). Because of the non-stationary character of most biosignals, the MTRS technique would be preferable for BRS analysis especially in conditions when only short stationary data segments are available or when dynamic changes of BRS should be monitored.

  4. A comparison of multiple imputation methods for incomplete longitudinal binary data.

    PubMed

    Yamaguchi, Yusuke; Misumi, Toshihiro; Maruo, Kazushi

    2018-01-01

    Longitudinal binary data are commonly encountered in clinical trials. Multiple imputation is an approach for getting a valid estimation of treatment effects under an assumption of missing at random mechanism. Although there are a variety of multiple imputation methods for the longitudinal binary data, a limited number of researches have reported on relative performances of the methods. Moreover, when focusing on the treatment effect throughout a period that has often been used in clinical evaluations of specific disease areas, no definite investigations comparing the methods have been available. We conducted an extensive simulation study to examine comparative performances of six multiple imputation methods available in the SAS MI procedure for longitudinal binary data, where two endpoints of responder rates at a specified time point and throughout a period were assessed. The simulation study suggested that results from naive approaches of a single imputation with non-responders and a complete case analysis could be very sensitive against missing data. The multiple imputation methods using a monotone method and a full conditional specification with a logistic regression imputation model were recommended for obtaining unbiased and robust estimations of the treatment effect. The methods were illustrated with data from a mental health research.

  5. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  6. A Factor Analytic and Regression Approach to Functional Age: Potential Effects of Race.

    ERIC Educational Resources Information Center

    Colquitt, Alan L.; And Others

    Factor analysis and multiple regression are two major approaches used to look at functional age, which takes account of the extensive variation in the rate of physiological and psychological maturation throughout life. To examine the role of racial or cultural influences on the measurement of functional age, a battery of 12 tests concentrating on…

  7. A Ten Year Study of Salary Differential by Sex through a Regression Methodology.

    ERIC Educational Resources Information Center

    Williams, John Delane; And Others

    A 10-year study of salary differential by sex was undertaken at the University of North Dakota using a multiple regression methodology, with rank, discipline, degree, years in department, years in current rank, and sex as predictors. The sex variable evidenced lower salaries for women when controlling for the other variables throughout the study…

  8. A primer for biomedical scientists on how to execute model II linear regression analysis.

    PubMed

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  9. Understanding poisson regression.

    PubMed

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  10. Criteria for the use of regression analysis for remote sensing of sediment and pollutants

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H.; Kuo, C. Y.; Lecroy, S. R. (Principal Investigator)

    1982-01-01

    Data analysis procedures for quantification of water quality parameters that are already identified and are known to exist within the water body are considered. The liner multiple-regression technique was examined as a procedure for defining and calibrating data analysis algorithms for such instruments as spectrometers and multispectral scanners.

  11. Catatonia in Down syndrome; a treatable cause of regression

    PubMed Central

    Ghaziuddin, Neera; Nassiri, Armin; Miles, Judith H

    2015-01-01

    Objective: The main aim of this case series report is to alert physicians to the occurrence of catatonia in Down syndrome (DS). A second aim is to stimulate the study of regression in DS and of catatonia. A subset of individuals with DS is noted to experience unexplained regression in behavior, mood, activities of daily living, motor activities, and intellectual functioning during adolescence or young adulthood. Depression, early onset Alzheimer’s, or just “the Down syndrome” are often blamed after general medical causes have been ruled out. Clinicians are generally unaware that catatonia, which can cause these symptoms, may occur in DS. Study design: Four DS adolescents who experienced regression are reported. Laboratory tests intended to rule out causes of motor and cognitive regression were within normal limits. Based on the presence of multiple motor disturbances (slowing and/or increased motor activity, grimacing, posturing), the individuals were diagnosed with unspecified catatonia and treated with anti-catatonic treatments (benzodiazepines and electroconvulsive therapy [ECT]). Results: All four cases were treated with a benzodiazepine combined with ECT and recovered their baseline functioning. Conclusion: We suspect catatonia is a common cause of unexplained deterioration in adolescents and young adults with DS. Moreover, pediatricians and others who care for individuals with DS are generally unfamiliar with the catatonia diagnosis outside schizophrenia, resulting in misdiagnosis and years of morbidity. Alerting physicians to catatonia in DS is essential to prompt diagnosis, appropriate treatment, and identification of the frequency and course of this disorder. PMID:25897230

  12. Composite marginal quantile regression analysis for longitudinal adolescent body mass index data.

    PubMed

    Yang, Chi-Chuan; Chen, Yi-Hau; Chang, Hsing-Yi

    2017-09-20

    Childhood and adolescenthood overweight or obesity, which may be quantified through the body mass index (BMI), is strongly associated with adult obesity and other health problems. Motivated by the child and adolescent behaviors in long-term evolution (CABLE) study, we are interested in individual, family, and school factors associated with marginal quantiles of longitudinal adolescent BMI values. We propose a new method for composite marginal quantile regression analysis for longitudinal outcome data, which performs marginal quantile regressions at multiple quantile levels simultaneously. The proposed method extends the quantile regression coefficient modeling method introduced by Frumento and Bottai (Biometrics 2016; 72:74-84) to longitudinal data accounting suitably for the correlation structure in longitudinal observations. A goodness-of-fit test for the proposed modeling is also developed. Simulation results show that the proposed method can be much more efficient than the analysis without taking correlation into account and the analysis performing separate quantile regressions at different quantile levels. The application to the longitudinal adolescent BMI data from the CABLE study demonstrates the practical utility of our proposal. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. A simple technique for obtaining future climate data inputs for natural resource models

    USDA-ARS?s Scientific Manuscript database

    Those conducting impact studies using natural resource models need to be able to quickly and easily obtain downscaled future climate data from multiple models, scenarios, and timescales for multiple locations. This paper describes a method of quickly obtaining future climate data over a wide range o...

  14. LAI inversion from optical reflectance using a neural network trained with a multiple scattering model

    NASA Technical Reports Server (NTRS)

    Smith, James A.

    1992-01-01

    The inversion of the leaf area index (LAI) canopy parameter from optical spectral reflectance measurements is obtained using a backpropagation artificial neural network trained using input-output pairs generated by a multiple scattering reflectance model. The problem of LAI estimation over sparse canopies (LAI < 1.0) with varying soil reflectance backgrounds is particularly difficult. Standard multiple regression methods applied to canopies within a single homogeneous soil type yield good results but perform unacceptably when applied across soil boundaries, resulting in absolute percentage errors of >1000 percent for low LAI. Minimization methods applied to merit functions constructed from differences between measured reflectances and predicted reflectances using multiple-scattering models are unacceptably sensitive to a good initial guess for the desired parameter. In contrast, the neural network reported generally yields absolute percentage errors of <30 percent when weighting coefficients trained on one soil type were applied to predicted canopy reflectance at a different soil background.

  15. Enhanced ID Pit Sizing Using Multivariate Regression Algorithm

    NASA Astrophysics Data System (ADS)

    Krzywosz, Kenji

    2007-03-01

    EPRI is funding a program to enhance and improve the reliability of inside diameter (ID) pit sizing for balance-of plant heat exchangers, such as condensers and component cooling water heat exchangers. More traditional approaches to ID pit sizing involve the use of frequency-specific amplitude or phase angles. The enhanced multivariate regression algorithm for ID pit depth sizing incorporates three simultaneous input parameters of frequency, amplitude, and phase angle. A set of calibration data sets consisting of machined pits of various rounded and elongated shapes and depths was acquired in the frequency range of 100 kHz to 1 MHz for stainless steel tubing having nominal wall thickness of 0.028 inch. To add noise to the acquired data set, each test sample was rotated and test data acquired at 3, 6, 9, and 12 o'clock positions. The ID pit depths were estimated using a second order and fourth order regression functions by relying on normalized amplitude and phase angle information from multiple frequencies. Due to unique damage morphology associated with the microbiologically-influenced ID pits, it was necessary to modify the elongated calibration standard-based algorithms by relying on the algorithm developed solely from the destructive sectioning results. This paper presents the use of transformed multivariate regression algorithm to estimate ID pit depths and compare the results with the traditional univariate phase angle analysis. Both estimates were then compared with the destructive sectioning results.

  16. Liquid electrolyte informatics using an exhaustive search with linear regression.

    PubMed

    Sodeyama, Keitaro; Igarashi, Yasuhiko; Nakayama, Tomofumi; Tateyama, Yoshitaka; Okada, Masato

    2018-06-14

    Exploring new liquid electrolyte materials is a fundamental target for developing new high-performance lithium-ion batteries. In contrast to solid materials, disordered liquid solution properties have been less studied by data-driven information techniques. Here, we examined the estimation accuracy and efficiency of three information techniques, multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), and exhaustive search with linear regression (ES-LiR), by using coordination energy and melting point as test liquid properties. We then confirmed that ES-LiR gives the most accurate estimation among the techniques. We also found that ES-LiR can provide the relationship between the "prediction accuracy" and "calculation cost" of the properties via a weight diagram of descriptors. This technique makes it possible to choose the balance of the "accuracy" and "cost" when the search of a huge amount of new materials was carried out.

  17. Improving Global Models of Remotely Sensed Ocean Chlorophyll Content Using Partial Least Squares and Geographically Weighted Regression

    NASA Astrophysics Data System (ADS)

    Gholizadeh, H.; Robeson, S. M.

    2015-12-01

    Empirical models have been widely used to estimate global chlorophyll content from remotely sensed data. Here, we focus on the standard NASA empirical models that use blue-green band ratios. These band ratio ocean color (OC) algorithms are in the form of fourth-order polynomials and the parameters of these polynomials (i.e. coefficients) are estimated from the NASA bio-Optical Marine Algorithm Data set (NOMAD). Most of the points in this data set have been sampled from tropical and temperate regions. However, polynomial coefficients obtained from this data set are used to estimate chlorophyll content in all ocean regions with different properties such as sea-surface temperature, salinity, and downwelling/upwelling patterns. Further, the polynomial terms in these models are highly correlated. In sum, the limitations of these empirical models are as follows: 1) the independent variables within the empirical models, in their current form, are correlated (multicollinear), and 2) current algorithms are global approaches and are based on the spatial stationarity assumption, so they are independent of location. Multicollinearity problem is resolved by using partial least squares (PLS). PLS, which transforms the data into a set of independent components, can be considered as a combined form of principal component regression (PCR) and multiple regression. Geographically weighted regression (GWR) is also used to investigate the validity of spatial stationarity assumption. GWR solves a regression model over each sample point by using the observations within its neighbourhood. PLS results show that the empirical method underestimates chlorophyll content in high latitudes, including the Southern Ocean region, when compared to PLS (see Figure 1). Cluster analysis of GWR coefficients also shows that the spatial stationarity assumption in empirical models is not likely a valid assumption.

  18. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  19. Forecasting daily meteorological time series using ARIMA and regression models

    NASA Astrophysics Data System (ADS)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  20. Brain enlargement is associated with regression in preschool-age boys with autism spectrum disorders

    PubMed Central

    Nordahl, Christine Wu; Lange, Nicholas; Li, Deana D.; Barnett, Lou Ann; Lee, Aaron; Buonocore, Michael H.; Simon, Tony J.; Rogers, Sally; Ozonoff, Sally; Amaral, David G.

    2011-01-01

    Autism is a heterogeneous disorder with multiple behavioral and biological phenotypes. Accelerated brain growth during early childhood is a well-established biological feature of autism. Onset pattern, i.e., early onset or regressive, is an intensely studied behavioral phenotype of autism. There is currently little known, however, about whether, or how, onset status maps onto the abnormal brain growth. We examined the relationship between total brain volume and onset status in a large sample of 2- to 4-y-old boys and girls with autism spectrum disorder (ASD) [n = 53, no regression (nREG); n = 61, regression (REG)] and a comparison group of age-matched typically developing controls (n = 66). We also examined retrospective head circumference measurements from birth through 18 mo of age. We found that abnormal brain enlargement was most commonly found in boys with regressive autism. Brain size in boys without regression did not differ from controls. Retrospective head circumference measurements indicate that head circumference in boys with regressive autism is normal at birth but diverges from the other groups around 4–6 mo of age. There were no differences in brain size in girls with autism (n = 22, ASD; n = 24, controls). These results suggest that there may be distinct neural phenotypes associated with different onsets of autism. For boys with regressive autism, divergence in brain size occurs well before loss of skills is commonly reported. Thus, rapid head growth may be a risk factor for regressive autism. PMID:22123952