applied regression including: Topics by Science.gov

Sample records for applied regression including

Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

ERIC Educational Resources Information Center

Warner, Rebecca M.

2007-01-01

This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
Regression modeling of ground-water flow

USGS Publications Warehouse

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Regression: The Apple Does Not Fall Far From the Tree.

PubMed

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Application of Partial Least Squares (PLS) Regression to Determine Landscape-Scale Aquatic Resource Vulnerability in the Ozark Mountains

EPA Science Inventory

Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology to study the associations among constituents of surface water and landscapes. Common data problems in ecological studies include: s...
Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

PubMed

Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods

NASA Astrophysics Data System (ADS)

Verrelst, Jochem; Malenovský, Zbyněk; Van der Tol, Christiaan; Camps-Valls, Gustau; Gastellu-Etchegorry, Jean-Philippe; Lewis, Philip; North, Peter; Moreno, Jose

2018-06-01

An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegetation biophysical variables. Identified retrieval methods are categorized into: (1) parametric regression, including vegetation indices, shape indices and spectral transformations; (2) nonparametric regression, including linear and nonlinear machine learning regression algorithms; (3) physically based, including inversion of radiative transfer models (RTMs) using numerical optimization and look-up table approaches; and (4) hybrid regression methods, which combine RTM simulations with machine learning regression methods. For each of these categories, an overview of widely applied methods with application to mapping vegetation properties is given. In view of processing imaging spectroscopy data, a critical aspect involves the challenge of dealing with spectral multicollinearity. The ability to provide robust estimates, retrieval uncertainties and acceptable retrieval processing speed are other important aspects in view of operational processing. Recommendations towards new-generation spectroscopy-based processing chains for operational production of biophysical variables are given.
Suppressor Variables: The Difference between "Is" versus "Acting As"

ERIC Educational Resources Information Center

Ludlow, Larry; Klein, Kelsey

2014-01-01

Correlated predictors in regression models are a fact of life in applied social science research. The extent to which they are correlated will influence the estimates and statistics associated with the other variables they are modeled along with. These effects, for example, may include enhanced regression coefficients for the other variables--a…
A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

ERIC Educational Resources Information Center

Baker, Bruce D.; Richards, Craig E.

1999-01-01

Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

NASA Technical Reports Server (NTRS)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Are your covariates under control? How normalization can re-introduce covariate effects.

PubMed

Pain, Oliver; Dudbridge, Frank; Ronald, Angelica

2018-04-30

Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
Multiple imputation for cure rate quantile regression with censored data.

PubMed

Wu, Yuanshan; Yin, Guosheng

2017-03-01

The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.
A Predictive Model for Readmissions Among Medicare Patients in a California Hospital.

PubMed

Duncan, Ian; Huynh, Nhan

2017-11-17

Predictive models for hospital readmission rates are in high demand because of the Centers for Medicare & Medicaid Services (CMS) Hospital Readmission Reduction Program (HRRP). The LACE index is one of the most popular predictive tools among hospitals in the United States. The LACE index is a simple tool with 4 parameters: Length of stay, Acuity of admission, Comorbidity, and Emergency visits in the previous 6 months. The authors applied logistic regression to develop a predictive model for a medium-sized not-for-profit community hospital in California using patient-level data with more specific patient information (including 13 explanatory variables). Specifically, the logistic regression is applied to 2 populations: a general population including all patients and the specific group of patients targeted by the CMS penalty (characterized as ages 65 or older with select conditions). The 2 resulting logistic regression models have a higher sensitivity rate compared to the sensitivity of the LACE index. The C statistic values of the model applied to both populations demonstrate moderate levels of predictive power. The authors also build an economic model to demonstrate the potential financial impact of the use of the model for targeting high-risk patients in a sample hospital and demonstrate that, on balance, whether the hospital gains or loses from reducing readmissions depends on its margin and the extent of its readmission penalties.
Integrative eQTL analysis of tumor and host omics data in individuals with bladder cancer.

PubMed

Pineda, Silvia; Van Steen, Kristel; Malats, Núria

2017-09-01

Integrative analyses of several omics data are emerging. The data are usually generated from the same source material (i.e., tumor sample) representing one level of regulation. However, integrating different regulatory levels (i.e., blood) with those from tumor may also reveal important knowledge about the human genetic architecture. To model this multilevel structure, an integrative-expression quantitative trait loci (eQTL) analysis applying two-stage regression (2SR) was proposed. This approach first regressed tumor gene expression levels with tumor markers and the adjusted residuals from the previous model were then regressed with the germline genotypes measured in blood. Previously, we demonstrated that penalized regression methods in combination with a permutation-based MaxT method (Global-LASSO) is a promising tool to fix some of the challenges that high-throughput omics data analysis imposes. Here, we assessed whether Global-LASSO can also be applied when tumor and blood omics data are integrated. We further compared our strategy with two 2SR-approaches, one using multiple linear regression (2SR-MLR) and other using LASSO (2SR-LASSO). We applied the three models to integrate genomic, epigenomic, and transcriptomic data from tumor tissue with blood germline genotypes from 181 individuals with bladder cancer included in the TCGA Consortium. Global-LASSO provided a larger list of eQTLs than the 2SR methods, identified a previously reported eQTLs in prostate stem cell antigen (PSCA), and provided further clues on the complexity of APBEC3B loci, with a minimal false-positive rate not achieved by 2SR-MLR. It also represents an important contribution for omics integrative analysis because it is easy to apply and adaptable to any type of data. © 2017 WILEY PERIODICALS, INC.
INNOVATIVE INSTRUMENTATION AND ANALYSIS OF THE TEMPERATURE MEASUREMENT FOR HIGH TEMPERATURE GASIFICATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seong W. Lee

2004-10-01

The systematic tests of the gasifier simulator on the clean thermocouple were completed in this reporting period. Within the systematic tests on the clean thermocouple, five (5) factors were considered as the experimental parameters including air flow rate, water flow rate, fine dust particle amount, ammonia addition and high/low frequency device (electric motor). The fractional factorial design method was used in the experiment design with sixteen (16) data sets of readings. Analysis of Variances (ANOVA) was applied to the results from systematic tests. The ANOVA results show that the un-balanced motor vibration frequency did not have the significant impact onmore » the temperature changes in the gasifier simulator. For the fine dust particles testing, the amount of fine dust particles has significant impact to the temperature measurements in the gasifier simulator. The effects of the air and water on the temperature measurements show the same results as reported in the previous report. The ammonia concentration was included as an experimental parameter for the reducing environment in this reporting period. The ammonia concentration does not seem to be a significant factor on the temperature changes. The linear regression analysis was applied to the temperature reading with five (5) factors. The accuracy of the linear regression is relatively low, which is less than 10% accuracy. Nonlinear regression was also conducted to the temperature reading with the same factors. Since the experiments were designed in two (2) levels, the nonlinear regression is not very effective with the dataset (16 readings). An extra central point test was conducted. With the data of the center point testing, the accuracy of the nonlinear regression is much better than the linear regression.« less
Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

PubMed

Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

2011-06-28

We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

PubMed Central

Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

2016-01-01

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Eye pupil detection system using an ensemble of regression forest and fast radial symmetry transform with a near infrared camera

NASA Astrophysics Data System (ADS)

Jeong, Mira; Nam, Jae-Yeal; Ko, Byoung Chul

2017-09-01

In this paper, we focus on pupil center detection in various video sequences that include head poses and changes in illumination. To detect the pupil center, we first find four eye landmarks in each eye by using cascade local regression based on a regression forest. Based on the rough location of the pupil, a fast radial symmetric transform is applied using the previously found pupil location to rearrange the fine pupil center. As the final step, the pupil displacement is estimated between the previous frame and the current frame to maintain the level of accuracy against a false locating result occurring in a particular frame. We generated a new face dataset, called Keimyung University pupil detection (KMUPD), with infrared camera. The proposed method was successfully applied to the KMUPD dataset, and the results indicate that its pupil center detection capability is better than that of other methods and with a shorter processing time.
An Expert System for the Evaluation of Cost Models

DTIC Science & Technology

1990-09-01

contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors.

PubMed

Woodard, Dawn B; Crainiceanu, Ciprian; Ruppert, David

2013-01-01

We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials.
Control Variate Selection for Multiresponse Simulation.

DTIC Science & Technology

1987-05-01

M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels

Linear regression in astronomy. II

NASA Technical Reports Server (NTRS)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression

PubMed Central

Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

2017-01-01

In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
A method for nonlinear exponential regression analysis

NASA Technical Reports Server (NTRS)

Junkin, B. G.

1971-01-01

A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Cardiovascular risk from water arsenic exposure in Vietnam: Application of systematic review and meta-regression analysis in chemical health risk assessment.

PubMed

Phung, Dung; Connell, Des; Rutherford, Shannon; Chu, Cordia

2017-06-01

A systematic review (SR) and meta-analysis cannot provide the endpoint answer for a chemical risk assessment (CRA). The objective of this study was to apply SR and meta-regression (MR) analysis to address this limitation using a case study in cardiovascular risk from arsenic exposure in Vietnam. Published studies were searched from PubMed using the keywords of arsenic exposure and cardiovascular diseases (CVD). Random-effects meta-regression was applied to model the linear relationship between arsenic concentration in water and risk of CVD, and then the no-observable-adverse-effect level (NOAEL) were identified from the regression function. The probabilistic risk assessment (PRA) technique was applied to characterize risk of CVD due to arsenic exposure by estimating the overlapping coefficient between dose-response and exposure distribution curves. The risks were evaluated for groundwater, treated and drinking water. A total of 8 high quality studies for dose-response and 12 studies for exposure data were included for final analyses. The results of MR suggested a NOAEL of 50 μg/L and a guideline of 5 μg/L for arsenic in water which valued as a half of NOAEL and guidelines recommended from previous studies and authorities. The results of PRA indicated that the observed exposure level with exceeding CVD risk was 52% for groundwater, 24% for treated water, and 10% for drinking water in Vietnam, respectively. The study found that systematic review and meta-regression can be considered as an ideal method to chemical risk assessment due to its advantages to bring the answer for the endpoint question of a CRA. Copyright © 2017 Elsevier Ltd. All rights reserved.
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

PubMed

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.

DTIC Science & Technology

1983-09-01

books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Mapping urban environmental noise: a land use regression method.

PubMed

Xie, Dan; Liu, Yi; Chen, Jining

2011-09-01

Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Quantile regression via vector generalized additive models.

PubMed

Yee, Thomas W

2004-07-30

One of the most popular methods for quantile regression is the LMS method of Cole and Green. The method naturally falls within a penalized likelihood framework, and consequently allows for considerable flexible because all three parameters may be modelled by cubic smoothing splines. The model is also very understandable: for a given value of the covariate, the LMS method applies a Box-Cox transformation to the response in order to transform it to standard normality; to obtain the quantiles, an inverse Box-Cox transformation is applied to the quantiles of the standard normal distribution. The purposes of this article are three-fold. Firstly, LMS quantile regression is presented within the framework of the class of vector generalized additive models. This confers a number of advantages such as a unifying theory and estimation process. Secondly, a new LMS method based on the Yeo-Johnson transformation is proposed, which has the advantage that the response is not restricted to be positive. Lastly, this paper describes a software implementation of three LMS quantile regression methods in the S language. This includes the LMS-Yeo-Johnson method, which is estimated efficiently by a new numerical integration scheme. The LMS-Yeo-Johnson method is illustrated by way of a large cross-sectional data set from a New Zealand working population. Copyright 2004 John Wiley & Sons, Ltd.
Binding affinity toward human prion protein of some anti-prion compounds - Assessment based on QSAR modeling, molecular docking and non-parametric ranking.

PubMed

Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija

2018-01-01

The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Independent variable complexity for regional regression of the flow duration curve in ungauged basins

NASA Astrophysics Data System (ADS)

Fouad, Geoffrey; Skupin, André; Hope, Allen

2016-04-01

The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions. These results are largely reflective of cross-correlation existing in hydrologic datasets, and highlight the limited predictive power of many traditionally used variables for regional regression. A parsimonious approach including fewer variables chosen based on their connection to streamflow may be more efficient than a data mining approach including many different variables. Future regional regression studies may benefit from having a hydrologic rationale for including different variables and attempting to create new variables related to streamflow.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

PubMed Central

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

PubMed

Hu, Yi-Chung

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Genetic Programming Transforms in Linear Regression Situations

NASA Astrophysics Data System (ADS)

Castillo, Flor; Kordon, Arthur; Villa, Carlos

The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Associations between dietary and lifestyle risk factors and colorectal cancer in the Scottish population.

PubMed

Theodoratou, Evropi; Farrington, Susan M; Tenesa, Albert; McNeill, Geraldine; Cetnarskyj, Roseanne; Korakakis, Emmanouil; Din, Farhat V N; Porteous, Mary E; Dunlop, Malcolm G; Campbell, Harry

2014-01-01

Colorectal cancer (CRC) accounts for 9.7% of all cancer cases and for 8% of all cancer-related deaths. Established risk factors include personal or family history of CRC as well as lifestyle and dietary factors. We investigated the relationship between CRC and demographic, lifestyle, food and nutrient risk factors through a case-control study that included 2062 patients and 2776 controls from Scotland. Forward and backward stepwise regression was applied and the stability of the models was assessed in 1000 bootstrap samples. The variables that were automatically selected to be included by the forward or backward stepwise regression and whose selection was verified by bootstrap sampling in the current study were family history, dietary energy, 'high-energy snack foods', eggs, juice, sugar-sweetened beverages and white fish (associated with an increased CRC risk) and NSAIDs, coffee and magnesium (associated with a decreased CRC risk). Application of forward and backward stepwise regression in this CRC study identified some already established as well as some novel potential risk factors. Bootstrap findings suggest that examination of the stability of regression models by bootstrap sampling is useful in the interpretation of study findings. 'High-energy snack foods' and high-energy drinks (including sugar-sweetened beverages and fruit juices) as risk factors for CRC have not been reported previously and merit further investigation as such snacks and beverages are important contributors in European and North American diets.
Separation of the long-term thermal effects from the strain measurements in the Geodynamics Laboratory of Lanzarote

NASA Astrophysics Data System (ADS)

Venedikov, A. P.; Arnoso, J.; Cai, W.; Vieira, R.; Tan, S.; Velez, E. J.

2006-01-01

A 12-year series (1992-2004) of strain measurements recorded in the Geodynamics Laboratory of Lanzarote is investigated. Through a tidal analysis the non-tidal component of the data is separated in order to use it for studying signals, useful for monitoring of the volcanic activity on the island. This component contains various perturbations of meteorological and oceanic origin, which should be eliminated in order to make the useful signals discernible. The paper is devoted to the estimation and elimination of the effect of the air temperature inside the station, which strongly dominates the strainmeter data. For solving this task, a regression model is applied, which includes a linear relation with the temperature and time-dependant polynomials. The regression includes nonlinearly a set of parameters, which are estimated by a properly applied Bayesian approach. The results obtained are: the regression coefficient of the strain data on temperature is equal to (-367.4 ± 0.8) × 10 -9 °C -1, the curve of the non-tidal component reduced by the effect of the temperature and a polynomial approximation of the reduced curve. The technique used here can be helpful to investigators in the domain of the earthquake and volcano monitoring. However, the fundamental and extremely difficult problem of what kind of signals in the reduced curves might be useful in this field is not considered here.
Treatment for stage 4A retinopathy of prematurity: laser and/or ranibizumab.

PubMed

Sukgen, Emine Alyamaç; Koçluk, Yusuf

2017-02-01

Stage 4A retinopathy of prematurity (ROP) is a critical phase where retinal detachment develops, but fovea is preserved. The present study aims to evaluate the effect of the first treatment choice (laser photocoagulation (LPC) or intravitreal ranibizumab (IVR)) applied in this critical phase on the prognosis of the disease. Records of patients diagnosed with stage 4A ROP and whose first treatment was applied in our clinic were evaluated retrospectively. All patients were referred to our clinic for the treatment of advanced ROP . While group 1 was composed of the patients who were administered LPC as first treatment, group 2 included patients where IVR was applied as first treatment. The patients in both groups were referred to surgical treatment in the presence of progression. The present study included a total of 31 eyes in 16 patients with stage 4A ROP. Eighteen eyes of nine patients in group 1 were first applied LPC, and 13 eyes of seven patients in group 2 were first applied intravitreal ranibizumab. While anatomic outcomes of ten eyes in both groups were favorable, eight eyes in group 1 and three eyes in group 2 displayed progression and were referred to vitreoretinal surgery. Laser and/or IVR treatment may be effective as a non-surgical treatment for stage 4A ROP. Especially stage 4A ROP until 6 clock hours can regress without surgical treatment. However, in stage 4A with involvement wider than 6 clock hours, non-surgical regression is difficult. Prospective controlled large series studies are necessary.
Creep-Rupture Data Analysis - Engineering Application of Regression Techniques. Ph.D. Thesis - North Carolina State Univ.

NASA Technical Reports Server (NTRS)

Rummler, D. R.

1976-01-01

The results are presented of investigations to apply regression techniques to the development of methodology for creep-rupture data analysis. Regression analysis techniques are applied to the explicit description of the creep behavior of materials for space shuttle thermal protection systems. A regression analysis technique is compared with five parametric methods for analyzing three simulated and twenty real data sets, and a computer program for the evaluation of creep-rupture data is presented.
A general framework for the use of logistic regression models in meta-analysis.

PubMed

Simmonds, Mark C; Higgins, Julian Pt

2016-12-01

Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.
Precision Interval Estimation of the Response Surface by Means of an Integrated Algorithm of Neural Network and Linear Regression

NASA Technical Reports Server (NTRS)

Lo, Ching F.

1999-01-01

The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

PubMed

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

Comparison of methods for the analysis of relatively simple mediation models.

PubMed

Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W

2017-09-01

Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.
Melanin and blood concentration in human skin studied by multiple regression analysis: experiments

NASA Astrophysics Data System (ADS)

Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.

2001-09-01

Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.
Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

ERIC Educational Resources Information Center

Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

2010-01-01

In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Trans-dimensional joint inversion of seabed scattering and reflection data.

PubMed

Steininger, Gavin; Dettmer, Jan; Dosso, Stan E; Holland, Charles W

2013-03-01

This paper examines joint inversion of acoustic scattering and reflection data to resolve seabed interface roughness parameters (spectral strength, exponent, and cutoff) and geoacoustic profiles. Trans-dimensional (trans-D) Bayesian sampling is applied with both the number of sediment layers and the order (zeroth or first) of auto-regressive parameters in the error model treated as unknowns. A prior distribution that allows fluid sediment layers over an elastic basement in a trans-D inversion is derived and implemented. Three cases are considered: Scattering-only inversion, joint scattering and reflection inversion, and joint inversion with the trans-D auto-regressive error model. Including reflection data improves the resolution of scattering and geoacoustic parameters. The trans-D auto-regressive model further improves scattering resolution and correctly differentiates between strongly and weakly correlated residual errors.
Regression model development and computational procedures to support estimation of real-time concentrations and loads of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-9

USGS Publications Warehouse

Lee, Michael T.; Asquith, William H.; Oden, Timothy D.

2012-01-01

In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged from .582–.922 (dimensionless). The residual standard errors ranged from .073–.447 (base-10 logarithm). Adjusted R-squared values for the East Fork San Jacinto River models ranged from .253–.853 (dimensionless). The residual standard errors ranged from .076–.388 (base-10 logarithm). In conjunction with estimated concentrations, constituent loads can be estimated by multiplying the estimated concentration by the corresponding streamflow and by applying the appropriate conversion factor. The regression models presented in this report are site specific, that is, they are specific to the Spring Creek and East Fork San Jacinto River streamflow-gaging stations; however, the general methods that were developed and documented could be applied to most perennial streams for the purpose of estimating real-time water quality data.
PSHREG: A SAS macro for proportional and nonproportional subdistribution hazards regression

PubMed Central

Kohl, Maria; Plischke, Max; Leffondré, Karen; Heinze, Georg

2015-01-01

We present a new SAS macro %pshreg that can be used to fit a proportional subdistribution hazards model for survival data subject to competing risks. Our macro first modifies the input data set appropriately and then applies SAS's standard Cox regression procedure, PROC PHREG, using weights and counting-process style of specifying survival times to the modified data set. The modified data set can also be used to estimate cumulative incidence curves for the event of interest. The application of PROC PHREG has several advantages, e.g., it directly enables the user to apply the Firth correction, which has been proposed as a solution to the problem of undefined (infinite) maximum likelihood estimates in Cox regression, frequently encountered in small sample analyses. Deviation from proportional subdistribution hazards can be detected by both inspecting Schoenfeld-type residuals and testing correlation of these residuals with time, or by including interactions of covariates with functions of time. We illustrate application of these extended methods for competing risk regression using our macro, which is freely available at: http://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/pshreg, by means of analysis of a real chronic kidney disease study. We discuss differences in features and capabilities of %pshreg and the recent (January 2014) SAS PROC PHREG implementation of proportional subdistribution hazards modelling. PMID:25572709
Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.

PubMed

Samoli, Evangelia; Butland, Barbara K

2017-12-01

Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.
Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis.

PubMed

Liu, Fei; Ye, Lanhan; Peng, Jiyu; Song, Kunlin; Shen, Tingting; Zhang, Chu; He, Yong

2018-02-27

Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.
Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

PubMed Central

Ye, Lanhan; Song, Kunlin; Shen, Tingting

2018-01-01

Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445
An empirical model for estimating annual consumption by freshwater fish populations

USGS Publications Warehouse

Liao, H.; Pierce, C.L.; Larscheid, J.G.

2005-01-01

Population consumption is an important process linking predator populations to their prey resources. Simple tools are needed to enable fisheries managers to estimate population consumption. We assembled 74 individual estimates of annual consumption by freshwater fish populations and their mean annual population size, 41 of which also included estimates of mean annual biomass. The data set included 14 freshwater fish species from 10 different bodies of water. From this data set we developed two simple linear regression models predicting annual population consumption. Log-transformed population size explained 94% of the variation in log-transformed annual population consumption. Log-transformed biomass explained 98% of the variation in log-transformed annual population consumption. We quantified the accuracy of our regressions and three alternative consumption models as the mean percent difference from observed (bioenergetics-derived) estimates in a test data set. Predictions from our population-size regression matched observed consumption estimates poorly (mean percent difference = 222%). Predictions from our biomass regression matched observed consumption reasonably well (mean percent difference = 24%). The biomass regression was superior to an alternative model, similar in complexity, and comparable to two alternative models that were more complex and difficult to apply. Our biomass regression model, log10(consumption) = 0.5442 + 0.9962??log10(biomass), will be a useful tool for fishery managers, enabling them to make reasonably accurate annual population consumption predictions from mean annual biomass estimates. ?? Copyright by the American Fisheries Society 2005.
Cox regression analysis with missing covariates via nonparametric multiple imputation.

PubMed

Hsu, Chiu-Hsieh; Yu, Mandi

2018-01-01

We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Moderation analysis using a two-level regression model.

PubMed

Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

2014-10-01

Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Screening for ketosis using multiple logistic regression based on milk yield and composition.

PubMed

Kayano, Mitsunori; Kataoka, Tomoko

2015-11-01

Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF - 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Inverse models: A necessary next step in ground-water modeling

USGS Publications Warehouse

Poeter, E.P.; Hill, M.C.

1997-01-01

Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.
Meta-regression approximations to reduce publication selection bias.

PubMed

Stanley, T D; Doucouliagos, Hristos

2014-03-01

Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with standard error (PEESE), is shown to have the smallest bias and mean squared error in most cases and to outperform conventional meta-analysis estimators, often by a great deal. Monte Carlo simulations also demonstrate how a new hybrid estimator that conditionally combines PEESE and the Egger regression intercept can provide a practical solution to publication selection bias. PEESE is easily expanded to accommodate systematic heterogeneity along with complex and differential publication selection bias that is related to moderator variables. By providing an intuitive reason for these approximations, we can also explain why the Egger regression works so well and when it does not. These meta-regression methods are applied to several policy-relevant areas of research including antidepressant effectiveness, the value of a statistical life, the minimum wage, and nicotine replacement therapy. Copyright © 2013 John Wiley & Sons, Ltd.
Measuring missing heritability: Inferring the contribution of common variants

PubMed Central

Golan, David; Lander, Eric S.; Rosset, Saharon

2014-01-01

Genome-wide association studies (GWASs), also called common variant association studies (CVASs), have uncovered thousands of genetic variants associated with hundreds of diseases. However, the variants that reach statistical significance typically explain only a small fraction of the heritability. One explanation for the “missing heritability” is that there are many additional disease-associated common variants whose effects are too small to detect with current sample sizes. It therefore is useful to have methods to quantify the heritability due to common variation, without having to identify all causal variants. Recent studies applied restricted maximum likelihood (REML) estimation to case–control studies for diseases. Here, we show that REML considerably underestimates the fraction of heritability due to common variation in this setting. The degree of underestimation increases with the rarity of disease, the heritability of the disease, and the size of the sample. Instead, we develop a general framework for heritability estimation, called phenotype correlation–genotype correlation (PCGC) regression, which generalizes the well-known Haseman–Elston regression method. We show that PCGC regression yields unbiased estimates. Applying PCGC regression to six diseases, we estimate the proportion of the phenotypic variance due to common variants to range from 25% to 56% and the proportion of heritability due to common variants from 41% to 68% (mean 60%). These results suggest that common variants may explain at least half the heritability for many diseases. PCGC regression also is readily applicable to other settings, including analyzing extreme-phenotype studies and adjusting for covariates such as sex, age, and population structure. PMID:25422463
A Poisson regression approach to model monthly hail occurrence in Northern Switzerland using large-scale environmental variables

NASA Astrophysics Data System (ADS)

Madonna, Erica; Ginsbourger, David; Martius, Olivia

2018-05-01

In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Estimated Perennial Streams of Idaho and Related Geospatial Datasets

USGS Publications Warehouse

Rea, Alan; Skinner, Kenneth D.

2009-01-01

The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of record, generally would be considered to represent flow conditions better at a given site than flow estimates based on regionalized regression models. The geospatial datasets of modeled perennial streams are considered a first-cut estimate, and should not be construed to override site-specific flow data.
Logistic LASSO regression for the diagnosis of breast cancer using clinical demographic data and the BI-RADS lexicon for ultrasonography.

PubMed

Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung

2018-01-01

The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

PubMed

Shi, J Q; Wang, B; Will, E J; West, R M

2012-11-20

We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.

Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

PubMed

Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon

2015-01-01

Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
Regression Discontinuity in Prospective Evaluations: The Case of the FFVP Evaluation

ERIC Educational Resources Information Center

Klerman, Jacob Alex; Olsho, Lauren E. W.; Bartlett, Susan

2015-01-01

While regression discontinuity has usually been applied retrospectively to secondary data, it is even more attractive when applied prospectively. In a prospective design, data collection can be focused on cases near the discontinuity, thereby improving internal validity and substantially increasing precision. Furthermore, such prospective…
ELM: an Algorithm to Estimate the Alpha Abundance from Low-resolution Spectra

NASA Astrophysics Data System (ADS)

Bu, Yude; Zhao, Gang; Pan, Jingchang; Bharat Kumar, Yerra

2016-01-01

We have investigated a novel methodology using the extreme learning machine (ELM) algorithm to determine the α abundance of stars. Applying two methods based on the ELM algorithm—ELM+spectra and ELM+Lick indices—to the stellar spectra from the ELODIE database, we measured the α abundance with a precision better than 0.065 dex. By applying these two methods to the spectra with different signal-to-noise ratios (S/Ns) and different resolutions, we found that ELM+spectra is more robust against degraded resolution and ELM+Lick indices is more robust against variation in S/N. To further validate the performance of ELM, we applied ELM+spectra and ELM+Lick indices to SDSS spectra and estimated α abundances with a precision around 0.10 dex, which is comparable to the results given by the SEGUE Stellar Parameter Pipeline. We further applied ELM to the spectra of stars in Galactic globular clusters (M15, M13, M71) and open clusters (NGC 2420, M67, NGC 6791), and results show good agreement with previous studies (within 1σ). A comparison of the ELM with other widely used methods including support vector machine, Gaussian process regression, artificial neural networks, and linear least-squares regression shows that ELM is efficient with computational resources and more accurate than other methods.
Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

PubMed

Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

2012-05-01

Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Post-processing through linear regression

NASA Astrophysics Data System (ADS)

van Schaeybroeck, B.; Vannitsem, S.

2011-03-01

Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Statistical relations among earthquake magnitude, surface rupture length, and surface fault displacement

USGS Publications Warehouse

Bonilla, M.G.; Mark, R.K.; Lienkaemper, J.J.

1984-01-01

In order to refine correlations of surface-wave magnitude, fault rupture length at the ground surface, and fault displacement at the surface by including the uncertainties in these variables, the existing data were critically reviewed and a new data base was compiled. Earthquake magnitudes were redetermined as necessary to make them as consistent as possible with the Gutenberg methods and results, which necessarily make up much of the data base. Measurement errors were estimated for the three variables for 58 moderate to large shallow-focus earthquakes. Regression analyses were then made utilizing the estimated measurement errors. The regression analysis demonstrates that the relations among the variables magnitude, length, and displacement are stochastic in nature. The stochastic variance, introduced in part by incomplete surface expression of seismogenic faulting, variation in shear modulus, and regional factors, dominates the estimated measurement errors. Thus, it is appropriate to use ordinary least squares for the regression models, rather than regression models based upon an underlying deterministic relation with the variance resulting from measurement errors. Significant differences exist in correlations of certain combinations of length, displacement, and magnitude when events are qrouped by fault type or by region, including attenuation regions delineated by Evernden and others. Subdivision of the data results in too few data for some fault types and regions, and for these only regressions using all of the data as a group are reported. Estimates of the magnitude and the standard deviation of the magnitude of a prehistoric or future earthquake associated with a fault can be made by correlating M with the logarithms of rupture length, fault displacement, or the product of length and displacement. Fault rupture area could be reliably estimated for about 20 of the events in the data set. Regression of MS on rupture area did not result in a marked improvement over regressions that did not involve rupture area. Because no subduction-zone earthquakes are included in this study, the reported results do not apply to such zones.
Morse Code, Scrabble, and the Alphabet

ERIC Educational Resources Information Center

Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss

2004-01-01

In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Specialization Agreements in the Council for Mutual Economic Assistance

DTIC Science & Technology

1988-02-01

proportions to stabilize variance (S. Weisberg, Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134). If the dependent...27, 1986, p. 3. Weisberg, S., Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134. Wiles, P. J., Communist International
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels

DTIC Science & Technology

2006-08-14

63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

PubMed Central

Huang, Jian; Zhang, Cun-Hui

2013-01-01

The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

PubMed

Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

2016-04-01

Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques

NASA Technical Reports Server (NTRS)

Lee, Hanbong; Malik, Waqar; Jung, Yoon C.

2016-01-01

Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.
Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

PubMed

Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

2014-01-01

Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

PubMed Central

Dai, Wensheng

2014-01-01

Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Using Parametric Cost Models to Estimate Engineering and Installation Costs of Selected Electronic Communications Systems

DTIC Science & Technology

1994-09-01

Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System

DTIC Science & Technology

1989-09-01

residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
Gender Gap in Mathematics and Physics in Chinese Middle Schools: A Case Study of A Beijing's District

ERIC Educational Resources Information Center

Li, Manli; Zhang, Yu; Wang, Yihan

2017-01-01

This study examines the gender gaps in mathematics and physics in Chinese middle schools. The data is from the Education Bureau management database which includes all middle school students who took high school entrance exam in a district of Beijing from 2006-2013. The ordinary least square model and quantile regression model are applied. This…
The Association between Environmental Factors and Scarlet Fever Incidence in Beijing Region: Using GIS and Spatial Regression Models

PubMed Central

Mahara, Gehendra; Wang, Chao; Yang, Kun; Chen, Sipeng; Guo, Jin; Gao, Qi; Wang, Wei; Wang, Quanyi; Guo, Xiuhua

2016-01-01

(1) Background: Evidence regarding scarlet fever and its relationship with meteorological, including air pollution factors, is not very available. This study aimed to examine the relationship between ambient air pollutants and meteorological factors with scarlet fever occurrence in Beijing, China. (2) Methods: A retrospective ecological study was carried out to distinguish the epidemic characteristics of scarlet fever incidence in Beijing districts from 2013 to 2014. Daily incidence and corresponding air pollutant and meteorological data were used to develop the model. Global Moran’s I statistic and Anselin’s local Moran’s I (LISA) were applied to detect the spatial autocorrelation (spatial dependency) and clusters of scarlet fever incidence. The spatial lag model (SLM) and spatial error model (SEM) including ordinary least squares (OLS) models were then applied to probe the association between scarlet fever incidence and meteorological including air pollution factors. (3) Results: Among the 5491 cases, more than half (62%) were male, and more than one-third (37.8%) were female, with the annual average incidence rate 14.64 per 100,000 population. Spatial autocorrelation analysis exhibited the existence of spatial dependence; therefore, we applied spatial regression models. After comparing the values of R-square, log-likelihood and the Akaike information criterion (AIC) among the three models, the OLS model (R2 = 0.0741, log likelihood = −1819.69, AIC = 3665.38), SLM (R2 = 0.0786, log likelihood = −1819.04, AIC = 3665.08) and SEM (R2 = 0.0743, log likelihood = −1819.67, AIC = 3665.36), identified that the spatial lag model (SLM) was best for model fit for the regression model. There was a positive significant association between nitrogen oxide (p = 0.027), rainfall (p = 0.036) and sunshine hour (p = 0.048), while the relative humidity (p = 0.034) had an adverse association with scarlet fever incidence in SLM. (4) Conclusions: Our findings indicated that meteorological, as well as air pollutant factors may increase the incidence of scarlet fever; these findings may help to guide scarlet fever control programs and targeting the intervention. PMID:27827946
The Association between Environmental Factors and Scarlet Fever Incidence in Beijing Region: Using GIS and Spatial Regression Models.

PubMed

Mahara, Gehendra; Wang, Chao; Yang, Kun; Chen, Sipeng; Guo, Jin; Gao, Qi; Wang, Wei; Wang, Quanyi; Guo, Xiuhua

2016-11-04

(1) Background: Evidence regarding scarlet fever and its relationship with meteorological, including air pollution factors, is not very available. This study aimed to examine the relationship between ambient air pollutants and meteorological factors with scarlet fever occurrence in Beijing, China. (2) Methods: A retrospective ecological study was carried out to distinguish the epidemic characteristics of scarlet fever incidence in Beijing districts from 2013 to 2014. Daily incidence and corresponding air pollutant and meteorological data were used to develop the model. Global Moran's I statistic and Anselin's local Moran's I (LISA) were applied to detect the spatial autocorrelation (spatial dependency) and clusters of scarlet fever incidence. The spatial lag model (SLM) and spatial error model (SEM) including ordinary least squares (OLS) models were then applied to probe the association between scarlet fever incidence and meteorological including air pollution factors. (3) Results: Among the 5491 cases, more than half (62%) were male, and more than one-third (37.8%) were female, with the annual average incidence rate 14.64 per 100,000 population. Spatial autocorrelation analysis exhibited the existence of spatial dependence; therefore, we applied spatial regression models. After comparing the values of R-square, log-likelihood and the Akaike information criterion (AIC) among the three models, the OLS model (R² = 0.0741, log likelihood = -1819.69, AIC = 3665.38), SLM (R² = 0.0786, log likelihood = -1819.04, AIC = 3665.08) and SEM (R² = 0.0743, log likelihood = -1819.67, AIC = 3665.36), identified that the spatial lag model (SLM) was best for model fit for the regression model. There was a positive significant association between nitrogen oxide ( p = 0.027), rainfall ( p = 0.036) and sunshine hour ( p = 0.048), while the relative humidity ( p = 0.034) had an adverse association with scarlet fever incidence in SLM. (4) Conclusions: Our findings indicated that meteorological, as well as air pollutant factors may increase the incidence of scarlet fever; these findings may help to guide scarlet fever control programs and targeting the intervention.
Applied Multiple Linear Regression: A General Research Strategy

ERIC Educational Resources Information Center

Smith, Brandon B.

1969-01-01

Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)

Linear regression crash prediction models : issues and proposed solutions.

DOT National Transportation Integrated Search

2010-05-01

The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Real estate value prediction using multivariate regression models

NASA Astrophysics Data System (ADS)

Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

2017-11-01

The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Impact of External Price Referencing on Medicine Prices – A Price Comparison Among 14 European Countries

PubMed Central

Leopold, Christine; Mantel-Teeuwisse, Aukje Katja; Seyfang, Leonhard; Vogler, Sabine; de Joncheere, Kees; Laing, Richard Ogilvie; Leufkens, Hubert

2012-01-01

Objectives: This study aims to examine the impact of external price referencing (EPR) on on-patent medicine prices, adjusting for other factors that may affect price levels such as sales volume, exchange rates, gross domestic product (GDP) per capita, total pharmaceutical expenditure (TPE), and size of the pharmaceutical industry. Methods: Price data of 14 on-patent products, in 14 European countries in 2007 and 2008 were obtained from the Pharmaceutical Price Information Service of the Austrian Health Institute. Based on the unit ex-factory prices in EURO, scaled ranks per country and per product were calculated. For the regression analysis the scaled ranks per country and product were weighted; each country had the same sum of weights but within a country the weights were proportional to its sales volume in the year (data obtained from IMS Health). Taking the scaled ranks, several statistical analyses were performed by using the program “R”, including a multiple regression analysis (including variables such as GDP per capita and national industry size). Results: This study showed that on average EPR as a pricing policy leads to lower prices. However, the large variation in price levels among countries using EPR confirmed that the price level is not only driven by EPR. The unadjusted linear regression model confirms that applying EPR in a country is associated with a lower scaled weighted rank (p=0.002). This interaction persisted after inclusion of total pharmaceutical expenditure per capita and GDP per capita in the final model. Conclusions: The study showed that for patented products, prices are in general lower in case the country applied EPR. Nevertheless substantial price differences among countries that apply EPR could be identified. Possible explanations could be found through a correlation between pharmaceutical industry and the scaled price ranks. In conclusion, we found that implementing external reference pricing could lead to lower prices. PMID:23532710
Analyzing the Impact of Ambient Temperature Indicators on Transformer Life in Different Regions of Chinese Mainland

PubMed Central

Bai, Cui-fen; Gao, Wen-Sheng; Liu, Tong

2013-01-01

Regression analysis is applied to quantitatively analyze the impact of different ambient temperature characteristics on the transformer life at different locations of Chinese mainland. 200 typical locations in Chinese mainland are selected for the study. They are specially divided into six regions so that the subsequent analysis can be done in a regional context. For each region, the local historical ambient temperature and load data are provided as inputs variables of the life consumption model in IEEE Std. C57.91-1995 to estimate the transformer life at every location. Five ambient temperature indicators related to the transformer life are involved into the partial least squares regression to describe their impact on the transformer life. According to a contribution measurement criterion of partial least squares regression, three indicators are conclusively found to be the most important factors influencing the transformer life, and an explicit expression is provided to describe the relationship between the indicators and the transformer life for every region. The analysis result is applicable to the area where the temperature characteristics are similar to Chinese mainland, and the expressions obtained can be applied to the other locations that are not included in this paper if these three indicators are known. PMID:23843729
Analyzing the impact of ambient temperature indicators on transformer life in different regions of Chinese mainland.

PubMed

Bai, Cui-fen; Gao, Wen-Sheng; Liu, Tong

2013-01-01

Regression analysis is applied to quantitatively analyze the impact of different ambient temperature characteristics on the transformer life at different locations of Chinese mainland. 200 typical locations in Chinese mainland are selected for the study. They are specially divided into six regions so that the subsequent analysis can be done in a regional context. For each region, the local historical ambient temperature and load data are provided as inputs variables of the life consumption model in IEEE Std. C57.91-1995 to estimate the transformer life at every location. Five ambient temperature indicators related to the transformer life are involved into the partial least squares regression to describe their impact on the transformer life. According to a contribution measurement criterion of partial least squares regression, three indicators are conclusively found to be the most important factors influencing the transformer life, and an explicit expression is provided to describe the relationship between the indicators and the transformer life for every region. The analysis result is applicable to the area where the temperature characteristics are similar to Chinese mainland, and the expressions obtained can be applied to the other locations that are not included in this paper if these three indicators are known.
Fish consumption in a sample of people in Bandar Abbas, Iran: application of the theory of planned behavior.

PubMed

Aghamolaei, Teamur; Sadat Tavafian, Sedigheh; Madani, Abdoulhossain

2012-09-01

This study aimed to apply the conceptual framework of the theory of planned behavior (TPB) to explain fish consumption in a sample of people who lived in Bandar Abbass, Iran. We investigated the role of three traditional constructs of TPB that included attitude, social norms, and perceived behavioral control in an effort to characterize the intention to consume fish as well as the behavioral trends that characterize fish consumption. Data were derived from a cross-sectional sample of 321 subjects. Alpha coefficient correlation and linear regression analysis were applied to test the relationships between constructs. The predictors of fish consumption frequency were also evaluated. Multiple regression analysis revealed that attitude, subjective norms, and perceived behavioral control significantly predicted intention to eat fish (R2 = 0.54, F = 128.4, P < 0.001). Multiple regression analysis for the intention to eat fish and perceived behavioral control revealed that both factors significantly predicted fish consumption frequency (R2 = 0.58, F = 223.1, P < 0.001). The results indicated that the models fit well with the data. Attitude, subjective norms, and perceived behavioral control all had significant positive impacts on behavioral intention. Moreover, both intention and perceived behavioral control could be used to predict the frequency of fish consumption.
Screening for ketosis using multiple logistic regression based on milk yield and composition

PubMed Central

KAYANO, Mitsunori; KATAOKA, Tomoko

2015-01-01

Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF − 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively. PMID:26074408
Application of logistic regression for landslide susceptibility zoning of Cekmece Area, Istanbul, Turkey

NASA Astrophysics Data System (ADS)

Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.

2006-11-01

As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Application of nonlinear least-squares regression to ground-water flow modeling, west-central Florida

USGS Publications Warehouse

Yobbi, D.K.

2000-01-01

A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.
Applying Kaplan-Meier to Item Response Data

ERIC Educational Resources Information Center

McNeish, Daniel

2018-01-01

Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Time series modeling by a regression approach based on a latent process.

PubMed

Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

2009-01-01

Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.
Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis

NASA Astrophysics Data System (ADS)

Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei

2016-10-01

Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.
Mixed oxidizer hybrid propulsion system optimization under uncertainty using applied response surface methodology and Monte Carlo simulation

NASA Astrophysics Data System (ADS)

Whitehead, James Joshua

The analysis documented herein provides an integrated approach for the conduct of optimization under uncertainty (OUU) using Monte Carlo Simulation (MCS) techniques coupled with response surface-based methods for characterization of mixture-dependent variables. This novel methodology provides an innovative means of conducting optimization studies under uncertainty in propulsion system design. Analytic inputs are based upon empirical regression rate information obtained from design of experiments (DOE) mixture studies utilizing a mixed oxidizer hybrid rocket concept. Hybrid fuel regression rate was selected as the target response variable for optimization under uncertainty, with maximization of regression rate chosen as the driving objective. Characteristic operational conditions and propellant mixture compositions from experimental efforts conducted during previous foundational work were combined with elemental uncertainty estimates as input variables. Response surfaces for mixture-dependent variables and their associated uncertainty levels were developed using quadratic response equations incorporating single and two-factor interactions. These analysis inputs, response surface equations and associated uncertainty contributions were applied to a probabilistic MCS to develop dispersed regression rates as a function of operational and mixture input conditions within design space. Illustrative case scenarios were developed and assessed using this analytic approach including fully and partially constrained operational condition sets over all of design mixture space. In addition, optimization sets were performed across an operationally representative region in operational space and across all investigated mixture combinations. These scenarios were selected as representative examples relevant to propulsion system optimization, particularly for hybrid and solid rocket platforms. Ternary diagrams, including contour and surface plots, were developed and utilized to aid in visualization. The concept of Expanded-Durov diagrams was also adopted and adapted to this study to aid in visualization of uncertainty bounds. Regions of maximum regression rate and associated uncertainties were determined for each set of case scenarios. Application of response surface methodology coupled with probabilistic-based MCS allowed for flexible and comprehensive interrogation of mixture and operating design space during optimization cases. Analyses were also conducted to assess sensitivity of uncertainty to variations in key elemental uncertainty estimates. The methodology developed during this research provides an innovative optimization tool for future propulsion design efforts.
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance

DTIC Science & Technology

1989-12-01

v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
Comparison of Selection Procedures and Validation of Criterion Used in Selection of Significant Control Variates of a Simulation Model

DTIC Science & Technology

1990-03-01

and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
Some Applied Research Concerns Using Multiple Linear Regression Analysis.

ERIC Educational Resources Information Center

Newman, Isadore; Fraas, John W.

The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…
ELM: AN ALGORITHM TO ESTIMATE THE ALPHA ABUNDANCE FROM LOW-RESOLUTION SPECTRA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bu, Yude; Zhao, Gang; Kumar, Yerra Bharat

We have investigated a novel methodology using the extreme learning machine (ELM) algorithm to determine the α abundance of stars. Applying two methods based on the ELM algorithm—ELM+spectra and ELM+Lick indices—to the stellar spectra from the ELODIE database, we measured the α abundance with a precision better than 0.065 dex. By applying these two methods to the spectra with different signal-to-noise ratios (S/Ns) and different resolutions, we found that ELM+spectra is more robust against degraded resolution and ELM+Lick indices is more robust against variation in S/N. To further validate the performance of ELM, we applied ELM+spectra and ELM+Lick indices to SDSSmore » spectra and estimated α abundances with a precision around 0.10 dex, which is comparable to the results given by the SEGUE Stellar Parameter Pipeline. We further applied ELM to the spectra of stars in Galactic globular clusters (M15, M13, M71) and open clusters (NGC 2420, M67, NGC 6791), and results show good agreement with previous studies (within 1σ). A comparison of the ELM with other widely used methods including support vector machine, Gaussian process regression, artificial neural networks, and linear least-squares regression shows that ELM is efficient with computational resources and more accurate than other methods.« less
Genetic analyses of protein yield in dairy cows applying random regression models with time-dependent and temperature x humidity-dependent covariates.

PubMed

Brügemann, K; Gernand, E; von Borstel, U U; König, S

2011-08-01

Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Statistical relations among earthquake magnitude, surface rupture length, and surface fault displacement

USGS Publications Warehouse

Bonilla, Manuel G.; Mark, Robert K.; Lienkaemper, James J.

1984-01-01

In order to refine correlations of surface-wave magnitude, fault rupture length at the ground surface, and fault displacement at the surface by including the uncertainties in these variables, the existing data were critically reviewed and a new data base was compiled. Earthquake magnitudes were redetermined as necessary to make them as consistent as possible with the Gutenberg methods and results, which make up much of the data base. Measurement errors were estimated for the three variables for 58 moderate to large shallow-focus earthquakes. Regression analyses were then made utilizing the estimated measurement errors.The regression analysis demonstrates that the relations among the variables magnitude, length, and displacement are stochastic in nature. The stochastic variance, introduced in part by incomplete surface expression of seismogenic faulting, variation in shear modulus, and regional factors, dominates the estimated measurement errors. Thus, it is appropriate to use ordinary least squares for the regression models, rather than regression models based upon an underlying deterministic relation in which the variance results primarily from measurement errors.Significant differences exist in correlations of certain combinations of length, displacement, and magnitude when events are grouped by fault type or by region, including attenuation regions delineated by Evernden and others.Estimates of the magnitude and the standard deviation of the magnitude of a prehistoric or future earthquake associated with a fault can be made by correlating Ms with the logarithms of rupture length, fault displacement, or the product of length and displacement.Fault rupture area could be reliably estimated for about 20 of the events in the data set. Regression of Ms on rupture area did not result in a marked improvement over regressions that did not involve rupture area. Because no subduction-zone earthquakes are included in this study, the reported results do not apply to such zones.
Statistical Tutorial | Center for Cancer Research

Cancer.gov

Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018. The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.

Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

NASA Astrophysics Data System (ADS)

Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

2017-03-01

This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Multilevel covariance regression with correlated random effects in the mean and variance structure.

PubMed

Quintero, Adrian; Lesaffre, Emmanuel

2017-09-01

Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Accounting for spatial effects in land use regression for urban air pollution modeling.

PubMed

Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

2015-01-01

In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Water quality parameter measurement using spectral signatures

NASA Technical Reports Server (NTRS)

White, P. E.

1973-01-01

Regression analysis is applied to the problem of measuring water quality parameters from remote sensing spectral signature data. The equations necessary to perform regression analysis are presented and methods of testing the strength and reliability of a regression are described. An efficient algorithm for selecting an optimal subset of the independent variables available for a regression is also presented.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

ERIC Educational Resources Information Center

Haberman, Shelby J.; Sinharay, Sandip

2010-01-01

Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

PubMed

Cui, Zaixu; Gong, Gaolang

2018-06-02

Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Examining the Association between Patient-Reported Symptoms of Attention and Memory Dysfunction with Objective Cognitive Performance: A Latent Regression Rasch Model Approach.

PubMed

Li, Yuelin; Root, James C; Atkinson, Thomas M; Ahles, Tim A

2016-06-01

Patient-reported cognition generally exhibits poor concordance with objectively assessed cognitive performance. In this article, we introduce latent regression Rasch modeling and provide a step-by-step tutorial for applying Rasch methods as an alternative to traditional correlation to better clarify the relationship of self-report and objective cognitive performance. An example analysis using these methods is also included. Introduction to latent regression Rasch modeling is provided together with a tutorial on implementing it using the JAGS programming language for the Bayesian posterior parameter estimates. In an example analysis, data from a longitudinal neurocognitive outcomes study of 132 breast cancer patients and 45 non-cancer matched controls that included self-report and objective performance measures pre- and post-treatment were analyzed using both conventional and latent regression Rasch model approaches. Consistent with previous research, conventional analysis and correlations between neurocognitive decline and self-reported problems were generally near zero. In contrast, application of latent regression Rasch modeling found statistically reliable associations between objective attention and processing speed measures with self-reported Attention and Memory scores. Latent regression Rasch modeling, together with correlation of specific self-reported cognitive domains with neurocognitive measures, helps to clarify the relationship of self-report with objective performance. While the majority of patients attribute their cognitive difficulties to memory decline, the Rash modeling suggests the importance of processing speed and initial learning. To encourage the use of this method, a step-by-step guide and programming language for implementation is provided. Implications of this method in cognitive outcomes research are discussed. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Visual abilities distinguish pitchers from hitters in professional baseball.

PubMed

Klemish, David; Ramger, Benjamin; Vittetoe, Kelly; Reiter, Jerome P; Tokdar, Surya T; Appelbaum, Lawrence Gregory

2018-01-01

This study aimed to evaluate the possibility that differences in sensorimotor abilities exist between hitters and pitchers in a large cohort of baseball players of varying levels of experience. Secondary data analysis was performed on 9 sensorimotor tasks comprising the Nike Sensory Station assessment battery. Bayesian hierarchical regression modelling was applied to test for differences between pitchers and hitters in data from 566 baseball players (112 high school, 85 college, 369 professional) collected at 20 testing centres. Explanatory variables including height, handedness, eye dominance, concussion history, and player position were modelled along with age curves using basis regression splines. Regression analyses revealed better performance for hitters relative to pitchers at the professional level in the visual clarity and depth perception tasks, but these differences did not exist at the high school or college levels. No significant differences were observed in the other 7 measures of sensorimotor capabilities included in the test battery, and no systematic biases were found between the testing centres. These findings, indicating that professional-level hitters have better visual acuity and depth perception than professional-level pitchers, affirm the notion that highly experienced athletes have differing perceptual skills. Findings are discussed in relation to deliberate practice theory.
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

PubMed Central

Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B

2015-01-01

Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
Nonlinear-regression groundwater flow modeling of a deep regional aquifer system

USGS Publications Warehouse

Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

1986-01-01

A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Women, Physical Activity, and Quality of Life: Self-concept as a Mediator.

PubMed

Gonzalo Silvestre, Tamara; Ubillos Landa, Silvia

2016-02-22

The objectives of this research are: (a) analyze the incremental validity of physical activity's (PA) influence on perceived quality of life (PQL); (b) determine if PA's predictive power is mediated by self-concept; and (c) study if results vary according to a unidimensional or multidimensional approach to self-concept measurement. The sample comprised 160 women from Burgos, Spain aged 18 to 45 years old. Non-probability sampling was used. Two three-step hierarchical regression analyses were applied to forecast PQL. The hedonic quality-of-life indicators, self-concept, self-esteem, and PA were included as independent variables. The first regression analysis included global self-concept as predictor variable, while the second included its five dimensions. Two mediation analyses were conducted to see if PA's ability to predict PQL was mediated by global and physical self-concept. Results from the first regression shows that self-concept, satisfaction with life, and PA were significant predictors. PA slightly but significantly increased explained variance in PQL (2.1%). In the second regression, substituting global self-concept with its five constituent factors, only the physical dimension and satisfaction with life predicted PQL, while PA ceased to be a significant predictor. Mediation analysis revealed that only physical self-concept mediates the relationship between PA and PQL (z = 1.97, p < .050), and not global self-concept. Physical self-concept was the strongest predictor and approximately 32.45 % of PA's effect on PQL was mediated by it. This study's findings support a multidimensional view of self-concept, and represent a more accurate image of the relationship between PQL, PA, and self-concept.
Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System

NASA Astrophysics Data System (ADS)

Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

1986-12-01

A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
X-31 aerodynamic characteristics determined from flight data

NASA Technical Reports Server (NTRS)

Kokolios, Alex

1993-01-01

The lateral aerodynamic characteristics of the X-31 were determined at angles of attack ranging from 20 to 45 deg. Estimates of the lateral stability and control parameters were obtained by applying two parameter estimation techniques, linear regression, and the extended Kalman filter to flight test data. An attempt to apply maximum likelihood to extract parameters from the flight data was also made but failed for the reasons presented. An overview of the System Identification process is given. The overview includes a listing of the more important properties of all three estimation techniques that were applied to the data. A comparison is given of results obtained from flight test data and wind tunnel data for four important lateral parameters. Finally, future research to be conducted in this area is discussed.
Genetic analysis of body weights of individually fed beef bulls in South Africa using random regression models.

PubMed

Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D

2012-02-08

The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.
The association between short interpregnancy interval and preterm birth in Louisiana: a comparison of methods.

PubMed

Howard, Elizabeth J; Harville, Emily; Kissinger, Patricia; Xiong, Xu

2013-07-01

There is growing interest in the application of propensity scores (PS) in epidemiologic studies, especially within the field of reproductive epidemiology. This retrospective cohort study assesses the impact of a short interpregnancy interval (IPI) on preterm birth and compares the results of the conventional logistic regression analysis with analyses utilizing a PS. The study included 96,378 singleton infants from Louisiana birth certificate data (1995-2007). Five regression models designed for methods comparison are presented. Ten percent (10.17 %) of all births were preterm; 26.83 % of births were from a short IPI. The PS-adjusted model produced a more conservative estimate of the exposure variable compared to the conventional logistic regression method (β-coefficient: 0.21 vs. 0.43), as well as a smaller standard error (0.024 vs. 0.028), odds ratio and 95 % confidence intervals [1.15 (1.09, 1.20) vs. 1.23 (1.17, 1.30)]. The inclusion of more covariate and interaction terms in the PS did not change the estimates of the exposure variable. This analysis indicates that PS-adjusted regression may be appropriate for validation of conventional methods in a large dataset with a fairly common outcome. PS's may be beneficial in producing more precise estimates, especially for models with many confounders and effect modifiers and where conventional adjustment with logistic regression is unsatisfactory. Short intervals between pregnancies are associated with preterm birth in this population, according to either technique. Birth spacing is an issue that women have some control over. Educational interventions, including birth control, should be applied during prenatal visits and following delivery.
Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data.

PubMed

Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan

2014-02-10

Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.
Estimating the concrete compressive strength using hard clustering and fuzzy clustering based regression techniques.

PubMed

Nagwani, Naresh Kumar; Deo, Shirish V

2014-01-01

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Estimating the Concrete Compressive Strength Using Hard Clustering and Fuzzy Clustering Based Regression Techniques

PubMed Central

Nagwani, Naresh Kumar; Deo, Shirish V.

2014-01-01

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Alternative configurations of Quantile Regression for estimating predictive uncertainty in water level forecasts for the Upper Severn River: a comparison

NASA Astrophysics Data System (ADS)

Lopez, Patricia; Verkade, Jan; Weerts, Albrecht; Solomatine, Dimitri

2014-05-01

Hydrological forecasting is subject to many sources of uncertainty, including those originating in initial state, boundary conditions, model structure and model parameters. Although uncertainty can be reduced, it can never be fully eliminated. Statistical post-processing techniques constitute an often used approach to estimate the hydrological predictive uncertainty, where a model of forecast error is built using a historical record of past forecasts and observations. The present study focuses on the use of the Quantile Regression (QR) technique as a hydrological post-processor. It estimates the predictive distribution of water levels using deterministic water level forecasts as predictors. This work aims to thoroughly verify uncertainty estimates using the implementation of QR that was applied in an operational setting in the UK National Flood Forecasting System, and to inter-compare forecast quality and skill in various, differing configurations of QR. These configurations are (i) 'classical' QR, (ii) QR constrained by a requirement that quantiles do not cross, (iii) QR derived on time series that have been transformed into the Normal domain (Normal Quantile Transformation - NQT), and (iv) a piecewise linear derivation of QR models. The QR configurations are applied to fourteen hydrological stations on the Upper Severn River with different catchments characteristics. Results of each QR configuration are conditionally verified for progressively higher flood levels, in terms of commonly used verification metrics and skill scores. These include Brier's probability score (BS), the continuous ranked probability score (CRPS) and corresponding skill scores as well as the Relative Operating Characteristic score (ROCS). Reliability diagrams are also presented and analysed. The results indicate that none of the four Quantile Regression configurations clearly outperforms the others.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Bader, Jon B.

2010-01-01

Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

Kepler AutoRegressive Planet Search (KARPS)

NASA Astrophysics Data System (ADS)

Caceres, Gabriel

2018-01-01

One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The Kepler AutoRegressive Planet Search (KARPS) project implements statistical methodology associated with autoregressive processes (in particular, ARIMA and ARFIMA) to model stellar lightcurves in order to improve exoplanet transit detection. We also develop a novel Transit Comb Filter (TCF) applied to the AR residuals which provides a periodogram analogous to the standard Box-fitting Least Squares (BLS) periodogram. We train a random forest classifier on known Kepler Objects of Interest (KOIs) using select features from different stages of this analysis, and then use ROC curves to define and calibrate the criteria to recover the KOI planet candidates with high fidelity. These statistical methods are detailed in a contributed poster (Feigelson et al., this meeting).These procedures are applied to the full DR25 dataset of NASA’s Kepler mission. Using the classification criteria, a vast majority of known KOIs are recovered and dozens of new KARPS Candidate Planets (KCPs) discovered, including ultra-short period exoplanets. The KCPs will be briefly presented and discussed.
Development of an Algorithm for Stroke Prediction: A National Health Insurance Database Study in Korea.

PubMed

Min, Seung Nam; Park, Se Jin; Kim, Dong Joon; Subramaniyam, Murali; Lee, Kyung-Sun

2018-01-01

Stroke is the second leading cause of death worldwide and remains an important health burden both for the individuals and for the national healthcare systems. Potentially modifiable risk factors for stroke include hypertension, cardiac disease, diabetes, and dysregulation of glucose metabolism, atrial fibrillation, and lifestyle factors. We aimed to derive a model equation for developing a stroke pre-diagnosis algorithm with the potentially modifiable risk factors. We used logistic regression for model derivation, together with data from the database of the Korea National Health Insurance Service (NHIS). We reviewed the NHIS records of 500,000 enrollees. For the regression analysis, data regarding 367 stroke patients were selected. The control group consisted of 500 patients followed up for 2 consecutive years and with no history of stroke. We developed a logistic regression model based on information regarding several well-known modifiable risk factors. The developed model could correctly discriminate between normal subjects and stroke patients in 65% of cases. The model developed in the present study can be applied in the clinical setting to estimate the probability of stroke in a year and thus improve the stroke prevention strategies in high-risk patients. The approach used to develop the stroke prevention algorithm can be applied for developing similar models for the pre-diagnosis of other diseases. © 2018 S. Karger AG, Basel.
Assessing the potential for improving S2S forecast skill through multimodel ensembling

NASA Astrophysics Data System (ADS)

Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.

2016-12-01

Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.
The gynecologic oncology fellowship interview process: Challenges and potential areas for improvement.

PubMed

Gressel, Gregory M; Van Arsdale, Anne; Dioun, Shayan M; Goldberg, Gary L; Nevadunsky, Nicole S

2017-05-01

The application and interview process for gynecologic oncology fellowship is highly competitive, time-consuming and expensive for applicants. We conducted a survey of successfully matched gynecologic oncology fellowship applicants to assess problems associated with the interview process and identify areas for improvement. All Society of Gynecologic Oncology (SGO) list-serve members who have participated in the match program for gynecologic oncology fellowship were asked to complete an online survey regarding the interview process. Linear regression modeling was used to examine association between year of match, number of programs applied to, cost incurred, and overall satisfaction. Two hundred and sixty-nine eligible participants reported applying to a mean of 20 programs [range 1-45] and were offered a mean of 14 interviews [range 1-43]. They spent an average of $6000 [$0-25,000], using personal savings (54%), credit cards (50%), family support (12%) or personal loans (3%). Seventy percent of respondents identified the match as fair, and 93% were satisfied. Interviewees spent a mean of 15 [0-45] days away from work and 37% reported difficulty arranging coverage. Linear regression showed an increase in number of programs applied to and cost per applicant over time ( p < 0.001) between 1993 and 2016. Applicants who applied to all available programs spent more ( p < 0.001) than those who applied to programs based on their location or quality. The current fellowship match was identified as fair and satisfying by most respondents despite being time consuming and expensive. Suggested alternative options included clustering interviews geographically or conducting preliminary interviews at the SGO Annual Meeting.
Conditional Density Estimation with HMM Based Support Vector Machines

NASA Astrophysics Data System (ADS)

Hu, Fasheng; Liu, Zhenqiu; Jia, Chunxin; Chen, Dechang

Conditional density estimation is very important in financial engineer, risk management, and other engineering computing problem. However, most regression models have a latent assumption that the probability density is a Gaussian distribution, which is not necessarily true in many real life applications. In this paper, we give a framework to estimate or predict the conditional density mixture dynamically. Through combining the Input-Output HMM with SVM regression together and building a SVM model in each state of the HMM, we can estimate a conditional density mixture instead of a single gaussian. With each SVM in each node, this model can be applied for not only regression but classifications as well. We applied this model to denoise the ECG data. The proposed method has the potential to apply to other time series such as stock market return predictions.
RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

PubMed

Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

2009-01-01

This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Age adjustment in ecological studies: using a study on arsenic ingestion and bladder cancer as an example.

PubMed

Guo, How-Ran

2011-10-20

Despite its limitations, ecological study design is widely applied in epidemiology. In most cases, adjustment for age is necessary, but different methods may lead to different conclusions. To compare three methods of age adjustment, a study on the associations between arsenic in drinking water and incidence of bladder cancer in 243 townships in Taiwan was used as an example. A total of 3068 cases of bladder cancer, including 2276 men and 792 women, were identified during a ten-year study period in the study townships. Three methods were applied to analyze the same data set on the ten-year study period. The first (Direct Method) applied direct standardization to obtain standardized incidence rate and then used it as the dependent variable in the regression analysis. The second (Indirect Method) applied indirect standardization to obtain standardized incidence ratio and then used it as the dependent variable in the regression analysis instead. The third (Variable Method) used proportions of residents in different age groups as a part of the independent variables in the multiple regression models. All three methods showed a statistically significant positive association between arsenic exposure above 0.64 mg/L and incidence of bladder cancer in men and women, but different results were observed for the other exposure categories. In addition, the risk estimates obtained by different methods for the same exposure category were all different. Using an empirical example, the current study confirmed the argument made by other researchers previously that whereas the three different methods of age adjustment may lead to different conclusions, only the third approach can obtain unbiased estimates of the risks. The third method can also generate estimates of the risk associated with each age group, but the other two are unable to evaluate the effects of age directly.
Fast function-on-scalar regression with penalized basis expansions.

PubMed

Reiss, Philip T; Huang, Lei; Mennes, Maarten

2010-01-01

Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.
Estimating Causal Effects with Ancestral Graph Markov Models

PubMed Central

Malinsky, Daniel; Spirtes, Peter

2017-01-01

We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the “IDA” procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validate the performance of our algorithm on simulated data and demonstrate improved precision over IDA when latent variables are present. PMID:28217244
Analysis of Learning Curve Fitting Techniques.

DTIC Science & Technology

1987-09-01

1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?

NASA Astrophysics Data System (ADS)

Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.

2016-12-01

Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
Reaeration equations derived from U.S. geological survey database

USGS Publications Warehouse

Melching, C.S.; Flores, H.E.

1999-01-01

Accurate estimation of the reaeration-rate coefficient (K2) is extremely important for waste-load allocation. Currently, available K2 estimation equations generally yield poor estimates when applied to stream conditions different from those for which the equations were derived because they were derived from small databases composed of potentially highly inaccurate measurements. A large data set of K2 measurements made with tracer-gas methods was compiled from U.S. Geological Survey studies. This compilation included 493 reaches on 166 streams in 23 states. Careful screening to detect and eliminate erroneous measurements reduced the date set to 371 measurements. These measurements were divided into four subgroups on the basis of flow regime (channel control or pool and riffle) and stream scale (discharge greater than or less than 0.556 m3/s). Multiple linear regression in logarithms was applied to relate K2 to 12 stream hydraulic and water-quality characteristics. The resulting best-estimation equations had the form of semiempirical equations that included the rate of energy dissipation and discharge or depth and width as variables. For equation verification, a data set of K2 measurements made with tracer-gas procedures by other agencies was compiled from the literature. This compilation included 127 reaches on at least 24 streams in at least seven states. The standard error of estimate obtained when applying the developed equations to the U.S. Geological Survey data set ranged from 44 to 61%, whereas the standard error of estimate was 78% when applied to the verification data set.Accurate estimation of the reaeration-rate coefficient (K2) is extremely important for waste-load allocation. Currently, available K2 estimation equations generally yield poor estimates when applied to stream conditions different from those for which the equations were derived because they were derived from small databases composed of potentially highly inaccurate measurements. A large data set of K2 measurements made with tracer-gas methods was compiled from U.S. Geological Survey studies. This compilation included 493 reaches on 166 streams in 23 states. Careful screening to detect and eliminate erroneous measurements reduced the data set to 371 measurements. These measurements were divided into four subgroups on the basis of flow regime (channel control or pool and riffle) and stream scale (discharge greater than or less than 0.556 m3/s). Multiple linear regression in logarithms was applied to relate K2 to 12 stream hydraulic and water-quality characteristics. The resulting best-estimation equations had the form of semiempirical equations that included the rate of energy dissipation and discharge or depth and width as variables. For equation verification, a data set of K2 measurements made with tracer-gas procedures by other agencies was compiled from the literature. This compilation included 127 reaches on at least 24 streams in at least seven states. The standard error of estimate obtained when applying the developed equations to the U.S. Geological Survey data set ranged from 44 to 61%, whereas the standard error of estimate was 78% when applied to the verification data set.
Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

PubMed

Chaurasia, Ashok; Harel, Ofer

2015-02-10

Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Automatic energy expenditure measurement for health science.

PubMed

Catal, Cagatay; Akbulut, Akhan

2018-04-01

It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

PubMed

Nikbakht, Roya; Bahrampour, Abbas

2017-01-01

Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Controlling Type I Error Rates in Assessing DIF for Logistic Regression Method Combined with SIBTEST Regression Correction Procedure and DIF-Free-Then-DIF Strategy

ERIC Educational Resources Information Center

Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung

2014-01-01

The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Spatial Assessment of Model Errors from Four Regression Techniques

Treesearch

Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

2005-01-01

Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Quantile Regression in the Study of Developmental Sciences

ERIC Educational Resources Information Center

Petscher, Yaacov; Logan, Jessica A. R.

2014-01-01

Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
qFeature

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-09-14

This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
An efficient probe of the cosmological CPT violation

NASA Astrophysics Data System (ADS)

Zhao, Gong-Bo; Wang, Yuting; Xia, Jun-Qing; Li, Mingzhe; Zhang, Xinmin

2015-07-01

We develop an efficient method based on the linear regression algorithm to probe the cosmological CPT violation using the CMB polarisation data. We validate this method using simulated CMB data and apply it to recent CMB observations. We find that a combined data sample of BICEP1 and BOOMERanG 2003 favours a nonzero isotropic rotation angle at 2.3σ confidence level, i.e., bar alpha=-3.3o±1.4o (68% CL) with systematics included.

Median nitrate concentrations in groundwater in the New Jersey Highlands Region estimated using regression models and land-surface characteristics

USGS Publications Warehouse

Baker, Ronald J.; Chepiga, Mary M.; Cauller, Stephen J.

2015-01-01

The Kaplan-Meier method of estimating summary statistics from left-censored data was applied in order to include nondetects (left-censored data) in median nitrate-concentration calculations. Median concentrations also were determined using three alternative methods of handling nondetects. Treatment of the 23 percent of samples that were nondetects had little effect on estimated median nitrate concentrations because method detection limits were mostly less than median values.
Effects of axial compression and rotation angle on torsional mechanical properties of bovine caudal discs.

PubMed

Bezci, Semih E; Klineberg, Eric O; O'Connell, Grace D

2018-01-01

The intervertebral disc is a complex joint that acts to support and transfer large multidirectional loads, including combinations of compression, tension, bending, and torsion. Direct comparison of disc torsion mechanics across studies has been difficult, due to differences in loading protocols. In particular, the lack of information on the combined effect of multiple parameters, including axial compressive preload and rotation angle, makes it difficult to discern whether disc torsion mechanics are sensitive to the variables used in the test protocol. Thus, the objective of this study was to evaluate compression-torsion mechanical behavior of healthy discs under a wide range of rotation angles. Bovine caudal discs were tested under a range of compressive preloads (150, 300, 600, and 900N) and rotation angles (± 1, 2, 3, 4, or 5°) applied at a rate of 0.5°/s. Torque-rotation data were used to characterize shape changes in the hysteresis loop and to calculate disc torsion mechanics. Torsional mechanical properties were described using multivariate regression models. The rate of change in torsional mechanical properties with compression depended on the maximum rotation angle applied, indicating a strong interaction between compressive stress and maximum rotation angle. The regression models reported here can be used to predict disc torsion mechanics under axial compression for a given disc geometry, compressive preload, and rotation angle. Copyright © 2017 Elsevier Ltd. All rights reserved.
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

PubMed Central

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects.

PubMed

Shin, Yoonseok

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Advances in data processing for open-path Fourier transform infrared spectrometry of greenhouse gases.

PubMed

Shao, Limin; Griffiths, Peter R; Leytem, April B

2010-10-01

The automated quantification of three greenhouse gases, ammonia, methane, and nitrous oxide, in the vicinity of a large dairy farm by open-path Fourier transform infrared (OP/FT-IR) spectrometry at intervals of 5 min is demonstrated. Spectral pretreatment, including the automated detection and correction of the effect of interrupting the infrared beam, is by a moving object, and the automated correction for the nonlinear detector response is applied to the measured interferograms. Two ways of obtaining quantitative data from OP/FT-IR data are described. The first, which is installed in a recently acquired commercial OP/FT-IR spectrometer, is based on classical least-squares (CLS) regression, and the second is based on partial least-squares (PLS) regression. It is shown that CLS regression only gives accurate results if the absorption features of the analytes are located in very short spectral intervals where lines due to atmospheric water vapor are absent or very weak; of the three analytes examined, only ammonia fell into this category. On the other hand, PLS regression works allowed what appeared to be accurate results to be obtained for all three analytes.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law

NASA Astrophysics Data System (ADS)

García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.

2016-06-01

The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Evaluation and application of regional turbidity-sediment regression models in Virginia

USGS Publications Warehouse

Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James S.; Chanat, Jeffrey G.

2015-01-01

Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.
Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results?

PubMed Central

Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee

2016-01-01

Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956
Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

PubMed

Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

2017-01-01

In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Applying Regression Analysis to Problems in Institutional Research.

ERIC Educational Resources Information Center

Bohannon, Tom R.

1988-01-01

Regression analysis is one of the most frequently used statistical techniques in institutional research. Principles of least squares, model building, residual analysis, influence statistics, and multi-collinearity are described and illustrated. (Author/MSE)
Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

PubMed

Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

2012-01-01

Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Gridded sunshine duration climate data record for Germany based on combined satellite and in situ observations

NASA Astrophysics Data System (ADS)

Walawender, Jakub; Kothe, Steffen; Trentmann, Jörg; Pfeifroth, Uwe; Cremer, Roswitha

2017-04-01

The purpose of this study is to create a 1 km2 gridded daily sunshine duration data record for Germany covering the period from 1983 to 2015 (33 years) based on satellite estimates of direct normalised surface solar radiation and in situ sunshine duration observations using a geostatistical approach. The CM SAF SARAH direct normalized irradiance (DNI) satellite climate data record and in situ observations of sunshine duration from 121 weather stations operated by DWD are used as input datasets. The selected period of 33 years is associated with the availability of satellite data. The number of ground stations is limited to 121 as there are only time series with less than 10% of missing observations over the selected period included to keep the long-term consistency of the output sunshine duration data record. In the first step, DNI data record is used to derive sunshine hours by applying WMO threshold of 120 W/m2 (SDU = DNI ≥ 120 W/m2) and weighting of sunny slots to correct the sunshine length between two instantaneous image data due to cloud movement. In the second step, linear regression between SDU and in situ sunshine duration is calculated to adjust the satellite product to the ground observations and the output regression coefficients are applied to create a regression grid. In the last step regression residuals are interpolated with ordinary kriging and added to the regression grid. A comprehensive accuracy assessment of the gridded sunshine duration data record is performed by calculating prediction errors (cross-validation routine). "R" is used for data processing. A short analysis of the spatial distribution and temporal variability of sunshine duration over Germany based on the created dataset will be presented. The gridded sunshine duration data are useful for applications in various climate-related studies, agriculture and solar energy potential calculations.
How is the weather? Forecasting inpatient glycemic control

PubMed Central

Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M

2017-01-01

Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
Comparison of regression coefficient and GIS-based methodologies for regional estimates of forest soil carbon stocks.

PubMed

Campbell, J Elliott; Moen, Jeremie C; Ney, Richard A; Schnoor, Jerald L

2008-03-01

Estimates of forest soil organic carbon (SOC) have applications in carbon science, soil quality studies, carbon sequestration technologies, and carbon trading. Forest SOC has been modeled using a regression coefficient methodology that applies mean SOC densities (mass/area) to broad forest regions. A higher resolution model is based on an approach that employs a geographic information system (GIS) with soil databases and satellite-derived landcover images. Despite this advancement, the regression approach remains the basis of current state and federal level greenhouse gas inventories. Both approaches are analyzed in detail for Wisconsin forest soils from 1983 to 2001, applying rigorous error-fixing algorithms to soil databases. Resulting SOC stock estimates are 20% larger when determined using the GIS method rather than the regression approach. Average annual rates of increase in SOC stocks are 3.6 and 1.0 million metric tons of carbon per year for the GIS and regression approaches respectively.
Quantile regression applied to spectral distance decay

USGS Publications Warehouse

Rocchini, D.; Cade, B.S.

2008-01-01

Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.
Imaging-based biomarkers of cognitive performance in older adults constructed via high-dimensional pattern regression applied to MRI and PET.

PubMed

Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos

2013-01-01

In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky

USGS Publications Warehouse

Martin, Gary R.; Arihood, Leslie D.

2010-01-01

This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.
Adolescent judgments and reasoning about the failure to include peers with social disabilities.

PubMed

Bottema-Beutel, Kristen; Li, Zhushan

2015-06-01

Adolescents with autism spectrum disorder often do not have access to crucial peer social activities. This study examines how typically developing adolescents evaluate decisions not to include a peer based on disability status, and the justifications they apply to these decisions. A clinical interview methodology was used to elicit judgments and justifications across four contexts. We found adolescents are more likely to judge the failure to include as acceptable in personal as compared to public contexts. Using logistic regression, we found that adolescents are more likely to provide moral justifications as to why failure to include is acceptable in a classroom as compared to home, lab group, and soccer practice contexts. Implications for intervention are also discussed.
Utility-Based Instruments for People with Dementia: A Systematic Review and Meta-Regression Analysis.

PubMed

Li, Li; Nguyen, Kim-Huong; Comans, Tracy; Scuffham, Paul

2018-04-01

Several utility-based instruments have been applied in cost-utility analysis to assess health state values for people with dementia. Nevertheless, concerns and uncertainty regarding their performance for people with dementia have been raised. To assess the performance of available utility-based instruments for people with dementia by comparing their psychometric properties and to explore factors that cause variations in the reported health state values generated from those instruments by conducting meta-regression analyses. A literature search was conducted and psychometric properties were synthesized to demonstrate the overall performance of each instrument. When available, health state values and variables such as the type of instrument and cognitive impairment levels were extracted from each article. A meta-regression analysis was undertaken and available covariates were included in the models. A total of 64 studies providing preference-based values were identified and included. The EuroQol five-dimension questionnaire demonstrated the best combination of feasibility, reliability, and validity. Meta-regression analyses suggested that significant differences exist between instruments, type of respondents, and mode of administration and the variations in estimated utility values had influences on incremental quality-adjusted life-year calculation. This review finds that the EuroQol five-dimension questionnaire is the most valid utility-based instrument for people with dementia, but should be replaced by others under certain circumstances. Although no utility estimates were reported in the article, the meta-regression analyses that examined variations in utility estimates produced by different instruments impact on cost-utility analysis, potentially altering the decision-making process in some circumstances. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
In vitro chemo-sensitivity assay guided chemotherapy is associated with prolonged overall survival in cancer patients.

PubMed

Udelnow, Andrej; Schönfęlder, Manfred; Würl, Peter; Halloul, Zuhir; Meyer, Frank; Lippert, Hans; Mroczkowski, Paweł

2013-06-01

The overall survival (OS) of patients suffering From various tumour entities was correlated with the results of in vitro-chemosensitivity assay (CSA) of the in vivo applied drugs. Tumour specimen (n=611) were dissected in 514 patients and incubated for primary tumour cell culture. The histocytological regression assay was performed 5 days after adding chemotherapeutic substances to the cell cultures. n=329 patients undergoing chemotherapy were included in the in vitro/in vivo associations. OS was assessed and in vitro response groups compared using survival analysis. Furthermore Cox-regression analysis was performed on OS including CSA, age, TNM classification and treatment course. The growth rate of the primary was 73-96% depending on tumour entity. The in-vitro response rate varied with histology and drugs (e.g. 8-18% for methotrexate and 33-83% for epirubicine). OS was significantly prolonged for patients treated with in vitro effective drugs compared to empiric therapy (log-rank-test, p=0.0435). Cox-regression revealed that application of in vitro effective drugs, residual tumour and postoperative radiotherapy determined the death risk independently. When patients were treated with drugs effective in our CSA, OS was significantly prolonged compared to empiric therapy. CSA guided chemotherapy should be compared to empiric treatment by a prospective randomized trial.

Deriving the Regression Equation without Using Calculus

ERIC Educational Resources Information Center

Gordon, Sheldon P.; Gordon, Florence S.

2004-01-01

Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…
The Nuisance of Nuisance Regression: Spectral Misspecification in a Common Approach to Resting-State fMRI Preprocessing Reintroduces Noise and Obscures Functional Connectivity

PubMed Central

Hallquist, Michael N.; Hwang, Kai; Luna, Beatriz

2013-01-01

Recent resting-state functional connectivity fMRI (RS-fcMRI) research has demonstrated that head motion during fMRI acquisition systematically influences connectivity estimates despite bandpass filtering and nuisance regression, which are intended to reduce such nuisance variability. We provide evidence that the effects of head motion and other nuisance signals are poorly controlled when the fMRI time series are bandpass-filtered but the regressors are unfiltered, resulting in the inadvertent reintroduction of nuisance-related variation into frequencies previously suppressed by the bandpass filter, as well as suboptimal correction for noise signals in the frequencies of interest. This is important because many RS-fcMRI studies, including some focusing on motion-related artifacts, have applied this approach. In two cohorts of individuals (n = 117 and 22) who completed resting-state fMRI scans, we found that the bandpass-regress approach consistently overestimated functional connectivity across the brain, typically on the order of r = .10 – .35, relative to a simultaneous bandpass filtering and nuisance regression approach. Inflated correlations under the bandpass-regress approach were associated with head motion and cardiac artifacts. Furthermore, distance-related differences in the association of head motion and connectivity estimates were much weaker for the simultaneous filtering approach. We recommend that future RS-fcMRI studies ensure that the frequencies of nuisance regressors and fMRI data match prior to nuisance regression, and we advocate a simultaneous bandpass filtering and nuisance regression strategy that better controls nuisance-related variability. PMID:23747457
Numerical simulations on unsteady operation processes of N2O/HTPB hybrid rocket motor with/without diaphragm

NASA Astrophysics Data System (ADS)

Zhang, Shuai; Hu, Fan; Wang, Donghui; Okolo. N, Patrick; Zhang, Weihua

2017-07-01

Numerical simulations on processes within a hybrid rocket motor were conducted in the past, where most of these simulations carried out majorly focused on steady state analysis. Solid fuel regression rate strongly depends on complicated physicochemical processes and internal fluid dynamic behavior within the rocket motor, which changes with both space and time during its operation, and are therefore more unsteady in characteristics. Numerical simulations on the unsteady operational processes of N2O/HTPB hybrid rocket motor with and without diaphragm are conducted within this research paper. A numerical model is established based on two dimensional axisymmetric unsteady Navier-Stokes equations having turbulence, combustion and coupled gas/solid phase formulations. Discrete phase model is used to simulate injection and vaporization of the liquid oxidizer. A dynamic mesh technique is applied to the non-uniform regression of fuel grain, while results of unsteady flow field, variation of regression rate distribution with time, regression process of burning surface and internal ballistics are all obtained. Due to presence of eddy flow, the diaphragm increases regression rate further downstream. Peak regression rates are observed close to flow reattachment regions, while these peak values decrease gradually, and peak position shift further downstream with time advancement. Motor performance is analyzed accordingly, and it is noticed that the case with diaphragm included results in combustion efficiency and specific impulse efficiency increase of roughly 10%, and ground thrust increase of 17.8%.
Evaluation of regression-based 3-D shoulder rhythms.

PubMed

Xu, Xu; Dickerson, Clark R; Lin, Jia-Hua; McGorry, Raymond W

2016-08-01

The movements of the humerus, the clavicle, and the scapula are not completely independent. The coupled pattern of movement of these bones is called the shoulder rhythm. To date, multiple studies have focused on providing regression-based 3-D shoulder rhythms, in which the orientations of the clavicle and the scapula are estimated by the orientation of the humerus. In this study, six existing regression-based shoulder rhythms were evaluated by an independent dataset in terms of their predictability. The datasets include the measured orientations of the humerus, the clavicle, and the scapula of 14 participants over 118 different upper arm postures. The predicted orientations of the clavicle and the scapula were derived from applying those regression-based shoulder rhythms to the humerus orientation. The results indicated that none of those regression-based shoulder rhythms provides consistently more accurate results than the others. For all the joint angles and all the shoulder rhythms, the RMSE are all greater than 5°. Among those shoulder rhythms, the scapula lateral/medial rotation has the strongest correlation between the predicted and the measured angles, while the other thoracoclavicular and thoracoscapular bone orientation angles only showed a weak to moderate correlation. Since the regression-based shoulder rhythm has been adopted for shoulder biomechanical models to estimate shoulder muscle activities and structure loads, there needs to be further investigation on how the predicted error from the shoulder rhythm affects the output of the biomechanical model. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

PubMed

Fu, Liya; Wang, You-Gan

2011-02-15

Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.
Optimizing methods for linking cinematic features to fMRI data.

PubMed

Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

2015-04-15

One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Exploration of walking behavior in Vermont using spatial regression.

DOT National Transportation Integrated Search

2015-06-01

This report focuses on the relationship between walking and its contributing factors by : applying spatial regression methods. Using the Vermont data from the New England : Transportation Survey (NETS), walking variables as well as 170 independent va...
The extension of total gain (TG) statistic in survival models: properties and applications.

PubMed

Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B

2015-07-01

The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.
Logistic regression for risk factor modelling in stuttering research.

PubMed

Reed, Phil; Wu, Yaqionq

2013-06-01

To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Regression analysis for LED color detection of visual-MIMO system

NASA Astrophysics Data System (ADS)

Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

2018-04-01

Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
LASIK and PRK in hyperopic astigmatic eyes: is early retreatment advisable?

PubMed

Frings, Andreas; Richard, Gisbert; Steinberg, Johannes; Druchkiv, Vasyl; Linke, Stephan Johannes; Katz, Toam

2016-01-01

To analyze the refractive and keratometric stability in hyperopic astigmatic laser in situ keratomileusis (LASIK) or photorefractive keratectomy (PRK) during the first 6 months after surgery. This retrospective cross-sectional study included 97 hyperopic eyes; 55 were treated with LASIK and 42 with PRK. Excimer ablation for all eyes was performed using the ALLEGRETTO excimer laser platform using a mitomycin C for PRK and a mechanical microkeratome for LASIK. Keratometric and refractive data were analyzed during three consecutive follow-up intervals (6 weeks, 3 months, and 6 months). The corneal topography was obtained using Scheimpflug topography, and subjective refractions were acquired by expert optometrists according to a standardized protocol. After 3 months, mean keratometry and spherical equivalent were stable after LASIK, whereas PRK-treated eyes presented statistically significant (P<0.001) regression of hyperopia. In eleven cases, hyperopic regression of >1 D occurred. The optical zone diameter did not correlate with the development of regression. After corneal laser refractive surgery, keratometric changes are followed by refractive changes and they occur up to 6 months after LASIK and for at least 6 months after PRK, and therefore, caution should be applied when retreatment is planned during the 1st year after surgery because hyperopic refractive regression can lead to suboptimal visual outcome. Keratometric and refractive stability is earlier achieved after LASIK, and therefore, retreatment may be independent of late regression.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

PubMed

Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

2016-01-01

Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
The recovery of bladder epithelial hyperplasia caused by a melamine diet-induced bladder calculus in mice.

PubMed

Sun, Ying; Jiang, Yi-Na; Xu, Chang-Fu; Du, Yun-Xia; Zhang, Jiao-Jiao; Yan, Yang; Gao, Xiao-Li

2014-02-01

Applying a model of bladder epithelial hyperplasia (BEH) caused by melamine-induced bladder calculus (BC), the recovery of BEH after melamine withdrawal was investigated. One experiment, comprising untreated, melamine and recovery groups, was conducted in Balb/c mice. Each group included 4 subgroups. Mice were fed normal-diet in untreated or a melamine-diet in other groups. The melamine-diet was then substituted with normal-diet in recovery group. Both of BC and BEH were observed after 14 and 56 days of melamine-diet. The BC is relatively uniform at the same melamine-diet durations. The BEH was diffuse with many mitotic figures, 4-7 rows of nuclei, and well-defined umbrella/intermediate cells. No marked differences in BEH degree were observed in the two different melamine-diet durations. On 4-42 days after melamine withdrawal, BC was not found, as the progressive regression with complete regression of BEH was observed, along with well-defined ageing/apoptotic cells in the superficial regions of BEH regression tissue. Conclusion, the melamine-induced BEH is relatively uniform, may be self-limiting in rows of nuclei, and can return to normal. Melamine withdrawal duration is critical for the BEH regression. Tissue of the BEH and its regression is ideal for exploring the renewal as well as growth biology of mammalian urothelium. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
A tutorial on the piecewise regression approach applied to bedload transport data

Treesearch

Sandra E. Ryan; Laurie S. Porth

2007-01-01

This tutorial demonstrates the application of piecewise regression to bedload data to define a shift in phase of transport so that the reader may perform similar analyses on available data. The use of piecewise regression analysis implicitly recognizes different functions fit to bedload data over varying ranges of flow. The transition from primarily low rates of sand...
Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

ERIC Educational Resources Information Center

Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

2013-01-01

In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
Implementations of geographically weighted lasso in spatial data with multicollinearity (Case study: Poverty modeling of Java Island)

NASA Astrophysics Data System (ADS)

Setiyorini, Anis; Suprijadi, Jadi; Handoko, Budhi

2017-03-01

Geographically Weighted Regression (GWR) is a regression model that takes into account the spatial heterogeneity effect. In the application of the GWR, inference on regression coefficients is often of interest, as is estimation and prediction of the response variable. Empirical research and studies have demonstrated that local correlation between explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated, a condition named multicollinearity. It later results on a large standard error on estimated regression coefficients, and, hence, problematic for inference on relationships between variables. Geographically Weighted Lasso (GWL) is a method which capable to deal with spatial heterogeneity and local multicollinearity in spatial data sets. GWL is a further development of GWR method, which adds a LASSO (Least Absolute Shrinkage and Selection Operator) constraint in parameter estimation. In this study, GWL will be applied by using fixed exponential kernel weights matrix to establish a poverty modeling of Java Island, Indonesia. The results of applying the GWL to poverty datasets show that this method stabilizes regression coefficients in the presence of multicollinearity and produces lower prediction and estimation error of the response variable than GWR does.
An Optimization of Inventory Demand Forecasting in University Healthcare Centre

NASA Astrophysics Data System (ADS)

Bon, A. T.; Ng, T. K.

2017-01-01

Healthcare industry becomes an important field for human beings nowadays as it concerns about one’s health. With that, forecasting demand for health services is an important step in managerial decision making for all healthcare organizations. Hence, a case study was conducted in University Health Centre to collect historical demand data of Panadol 650mg for 68 months from January 2009 until August 2014. The aim of the research is to optimize the overall inventory demand through forecasting techniques. Quantitative forecasting or time series forecasting model was used in the case study to forecast future data as a function of past data. Furthermore, the data pattern needs to be identified first before applying the forecasting techniques. Trend is the data pattern and then ten forecasting techniques are applied using Risk Simulator Software. Lastly, the best forecasting techniques will be find out with the least forecasting error. Among the ten forecasting techniques include single moving average, single exponential smoothing, double moving average, double exponential smoothing, regression, Holt-Winter’s additive, Seasonal additive, Holt-Winter’s multiplicative, seasonal multiplicative and Autoregressive Integrated Moving Average (ARIMA). According to the forecasting accuracy measurement, the best forecasting technique is regression analysis.
Increased incidence of peptic ulcer disease in central serous chorioretinopathy patients: a population-based retrospective cohort study.

PubMed

Chen, San-Ni; Lian, Iebin; Chen, Yi-Chiao; Ho, Jau-Der

2015-02-01

To investigate peptic ulcer disease and other possible risk factors in patients with central serous chorioretinopathy (CSR) using a population-based database. In this population-based retrospective cohort study, longitudinal data from the Taiwan National Health Insurance Research Database were analyzed. The study cohort comprised 835 patients with CSR and the control cohort comprised 4175 patients without CSR from January 2000 to December 2009. Conditional logistic regression was applied to examine the association of peptic ulcer disease and other possible risk factors for CSR, and stratified Cox regression models were applied to examine whether patients with CSR have an increased chance of peptic ulcer disease and hypertension development. The identifiable risk factors for CSR included peptic ulcer disease (adjusted odd ratio: 1.39, P = 0.001) and higher monthly income (adjusted odd ratio: 1.30, P = 0.006). Patients with CSR also had a significantly higher chance of developing peptic ulcer disease after the diagnosis of CSR (adjusted odd ratio: 1.43, P = 0.009). Peptic ulcer disease and higher monthly income are independent risk factors for CSR. Whereas, patients with CSR also had increased risk for peptic ulcer development.
Traffic flow forecasting using approximate nearest neighbor nonparametric regression

DOT National Transportation Integrated Search

2000-12-01

The purpose of this research is to enhance nonparametric regression (NPR) for use in real-time systems by first reducing execution time using advanced data structures and imprecise computations and then developing a methodology for applying NPR. Due ...
Introduction to the use of regression models in epidemiology.

PubMed

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

PubMed

Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

2012-12-01

A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Ecologic regression analysis and the study of the influence of air quality on mortality.

PubMed Central

Selvin, S; Merrill, D; Wong, L; Sacks, S T

1984-01-01

This presentation focuses entirely on the use and evaluation of regression analysis applied to ecologic data as a method to study the effects of ambient air pollution on mortality rates. Using extensive national data on mortality, air quality and socio-economic status regression analyses are used to study the influence of air quality on mortality. The analytic methods and data are selected in such a way that direct comparisons can be made with other ecologic regression studies of mortality and air quality. Analyses are performed by use of two types of geographic areas, age-specific mortality of both males and females and three pollutants (total suspended particulates, sulfur dioxide and nitrogen dioxide). The overall results indicate no persuasive evidence exists of a link between air quality and general mortality levels. Additionally, a lack of consistency between the present results and previous published work is noted. Overall, it is concluded that linear regression analysis applied to nationally collected ecologic data cannot be used to usefully infer a causal relationship between air quality and mortality which is in direct contradiction to other major published studies. PMID:6734568
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

PubMed

Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

2018-06-01

Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Multi-parameters monitoring during traditional Chinese medicine concentration process with near infrared spectroscopy and chemometrics

NASA Astrophysics Data System (ADS)

Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang

2018-03-01

As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.
Soil Cd, Cr, Cu, Ni, Pb and Zn sorption and retention models using SVM: Variable selection and competitive model.

PubMed

González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F

2017-09-01

The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.
An epidemiologic study of index and family infectious mononucleosis and adult Hodgkin's disease (HD): evidence for a specific association with EBV+ve HD in young adults.

PubMed

Alexander, Freda E; Lawrence, Davia J; Freeland, June; Krajewski, Andrew S; Angus, Brian; Taylor, G Malcolm; Jarrett, Ruth F

2003-11-01

Infectious mononucleosis (IM) is an established risk factor for Hodgkin's disease (HD). A substantial minority (33%) of cases of HD have Epstein-Barr virus (EBV) DNA within the malignant cells (are EBV+ve). It is unclear whether risk after IM applies specifically to EBV+ve HD. We report the results of a population-based case-control study of HD in adults (n = 408 cases of classical HD, 513 controls) aged 16-74 years; the case series included 113 EBV+ve and 243 EBV+ve HD. Analyses compared total HD, EBV+ve HD and EBV-ve HD with the controls and EBV+ve HD with EBV-ve HD cases using, mainly, logistic regression. Regression analyses were adjusted for gender, age-group and socioeconomic status, and were performed for the whole age range and separately for young (< 35 years) and old adults (> or = 35 years); formal tests of effect modification by age were included. For the young adults, reported IM in index or relative was strongly and significantly associated with EBV+ve HD when compared to controls (odds ratio [OR] = 2.94, 95% confidence interval [CI]: 1.08-7.98 and OR = 5.22, 95% CI: 2.15-12.68, respectively). These results may be interpreted as indications that late first exposure to EBV increases risk of HD, especially in young adults; this applies primarily to EBV+ve HD. Copyright 2003 Wiley-Liss, Inc.
How to address data gaps in life cycle inventories: a case study on estimating CO2 emissions from coal-fired electricity plants on a global scale.

PubMed

Steinmann, Zoran J N; Venkatesh, Aranya; Hauck, Mara; Schipper, Aafke M; Karuppiah, Ramkumar; Laurenzi, Ian J; Huijbregts, Mark A J

2014-05-06

One of the major challenges in life cycle assessment (LCA) is the availability and quality of data used to develop models and to make appropriate recommendations. Approximations and assumptions are often made if appropriate data are not readily available. However, these proxies may introduce uncertainty into the results. A regression model framework may be employed to assess missing data in LCAs of products and processes. In this study, we develop such a regression-based framework to estimate CO2 emission factors associated with coal power plants in the absence of reported data. Our framework hypothesizes that emissions from coal power plants can be explained by plant-specific factors (predictors) that include steam pressure, total capacity, plant age, fuel type, and gross domestic product (GDP) per capita of the resident nations of those plants. Using reported emission data for 444 plants worldwide, plant level CO2 emission factors were fitted to the selected predictors by a multiple linear regression model and a local linear regression model. The validated models were then applied to 764 coal power plants worldwide, for which no reported data were available. Cumulatively, available reported data and our predictions together account for 74% of the total world's coal-fired power generation capacity.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

PubMed

Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

2015-03-01

A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
Solid-phase cadmium speciation in soil using L3-edge XANES spectroscopy with partial least-squares regression.

PubMed

Siebers, Nina; Kruse, Jens; Eckhardt, Kai-Uwe; Hu, Yongfeng; Leinweber, Peter

2012-07-01

Cadmium (Cd) has a high toxicity and resolving its speciation in soil is challenging but essential for estimating the environmental risk. In this study partial least-square (PLS) regression was tested for its capability to deconvolute Cd L(3)-edge X-ray absorption near-edge structure (XANES) spectra of multi-compound mixtures. For this, a library of Cd reference compound spectra and a spectrum of a soil sample were acquired. A good coefficient of determination (R(2)) of Cd compounds in mixtures was obtained for the PLS model using binary and ternary mixtures of various Cd reference compounds proving the validity of this approach. In order to describe complex systems like soil, multi-compound mixtures of a variety of Cd compounds must be included in the PLS model. The obtained PLS regression model was then applied to a highly Cd-contaminated soil revealing Cd(3)(PO(4))(2) (36.1%), Cd(NO(3))(2)·4H(2)O (24.5%), Cd(OH)(2) (21.7%), CdCO(3) (17.1%) and CdCl(2) (0.4%). These preliminary results proved that PLS regression is a promising approach for a direct determination of Cd speciation in the solid phase of a soil sample.
Comparative analysis on the probability of being a good payer

NASA Astrophysics Data System (ADS)

Mihova, V.; Pavlov, V.

2017-10-01

Credit risk assessment is crucial for the bank industry. The current practice uses various approaches for the calculation of credit risk. The core of these approaches is the use of multiple regression models, applied in order to assess the risk associated with the approval of people applying for certain products (loans, credit cards, etc.). Based on data from the past, these models try to predict what will happen in the future. Different data requires different type of models. This work studies the causal link between the conduct of an applicant upon payment of the loan and the data that he completed at the time of application. A database of 100 borrowers from a commercial bank is used for the purposes of the study. The available data includes information from the time of application and credit history while paying off the loan. Customers are divided into two groups, based on the credit history: Good and Bad payers. Linear and logistic regression are applied in parallel to the data in order to estimate the probability of being good for new borrowers. A variable, which contains value of 1 for Good borrowers and value of 0 for Bad candidates, is modeled as a dependent variable. To decide which of the variables listed in the database should be used in the modelling process (as independent variables), a correlation analysis is made. Due to the results of it, several combinations of independent variables are tested as initial models - both with linear and logistic regression. The best linear and logistic models are obtained after initial transformation of the data and following a set of standard and robust statistical criteria. A comparative analysis between the two final models is made and scorecards are obtained from both models to assess new customers at the time of application. A cut-off level of points, bellow which to reject the applications and above it - to accept them, has been suggested for both the models, applying the strategy to keep the same Accept Rate as in the current data.
Identification of immune correlates of protection in Shigella infection by application of machine learning.

PubMed

Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

2017-10-01

Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
Correlates of motivation to change in pathological gamblers completing cognitive-behavioral group therapy.

PubMed

Gómez-Peña, Mónica; Penelo, Eva; Granero, Roser; Fernández-Aranda, Fernando; Alvarez-Moya, Eva; Santamaría, Juan José; Moragas, Laura; Neus Aymamí, Maria; Gunnard, Katarina; Menchón, José M; Jimenez-Murcia, Susana

2012-07-01

The present study analyzes the association between the motivation to change and the cognitive-behavioral group intervention, in terms of dropouts and relapses, in a sample of male pathological gamblers. The specific objectives were as follows: (a) to estimate the predictive value of baseline University of Rhode Island Change Assessment scale (URICA) scores (i.e., at the start of the study) as regards the risk of relapse and dropout during treatment and (b) to assess the incremental predictive ability of URICA scores, as regards the mean change produced in the clinical status of patients between the start and finish of treatment. The relationship between the URICA and the response to treatment was analyzed by means of a pre-post design applied to a sample of 191 patients who were consecutively receiving cognitive-behavioral group therapy. The statistical analysis included logistic regression models and hierarchical multiple linear regression models. The discriminative ability of the models including the four URICA scores regarding the likelihood of relapse and dropout was acceptable (area under the receiver operating haracteristic curve: .73 and .71, respectively). No significant predictive ability was found as regards the differences between baseline and posttreatment scores (changes in R(2) below 5% in the multiple regression models). The availability of useful measures of motivation to change would enable treatment outcomes to be optimized through the application of specific therapeutic interventions. © 2012 Wiley Periodicals, Inc.
Robustness of meta-analyses in finding gene × environment interactions

PubMed Central

Shi, Gang; Nehorai, Arye

2017-01-01

Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796
A simple measure of cognitive reserve is relevant for cognitive performance in MS patients.

PubMed

Della Corte, Marida; Santangelo, Gabriella; Bisecco, Alvino; Sacco, Rosaria; Siciliano, Mattia; d'Ambrosio, Alessandro; Docimo, Renato; Cuomo, Teresa; Lavorgna, Luigi; Bonavita, Simona; Tedeschi, Gioacchino; Gallo, Antonio

2018-05-04

Cognitive reserve (CR) contributes to preserve cognition despite brain damage. This theory has been applied to multiple sclerosis (MS) to explain the partial relationship between cognition and MRI markers of brain pathology. Our aim was to determine the relationship between two measures of CR and cognition in MS. One hundred and forty-seven MS patients were enrolled. Cognition was assessed using the Rao's Brief Repeatable Battery and the Stroop Test. CR was measured as the vocabulary subtest of the WAIS-R score (VOC) and the number of years of formal education (EDU). Regression analysis included raw score data on each neuropsychological (NP) test as dependent variables and demographic/clinical parameters, VOC, and EDU as independent predictors. A binary logistic regression analysis including clinical/CR parameters as covariates and absence/presence of cognitive deficits as dependent variables was performed too. VOC, but not EDU, was strongly correlated with performances at all ten NP tests. EDU was correlated with executive performances. The binary logistic regression showed that only the Expanded Disability Status Scale (EDSS) and VOC were independently correlated with the presence/absence of CD. The lower the VOC and/or the higher the EDSS, the higher the frequency of CD. In conclusion, our study supports the relevance of CR in subtending cognitive performances and the presence of CD in MS patients.
A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.

PubMed

Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C

2014-12-01

It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Post-processing method for wind speed ensemble forecast using wind speed and direction

NASA Astrophysics Data System (ADS)

Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

2017-04-01

Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.
Robust regression on noisy data for fusion scaling laws

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be; Laboratoire de Physique des Plasmas de l'ERM - Laboratorium voor Plasmafysica van de KMS

2014-11-15

We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
Regression Model Optimization for the Analysis of Experimental Data

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2009-01-01

A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
An Alternative Method for Allocating Base Maintenance Supplies to Mission, Design, and Series Aircraft in the United States Air Force.

DTIC Science & Technology

1987-09-01

Edition,. Fail 1986. 33. Neter, John et al. Applied Linear Regression MoceL. Homewood IL: Richard D. Irwin, Incorporated, iJ83. 34. NovicK, David... Linear Regression Models (33) then, for each sample observation (X fh, the method of least squares considers the deviation of Yubms from its expected value...for finding good estimators of b - b5 * In -2raer to explain the procedure, the model Yubms = b0 + b!xfh will be discussed. According to Applied
Modeling energy expenditure in children and adolescents using quantile regression

USDA-ARS?s Scientific Manuscript database

Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...

Assessing the impact of natural policy experiments on socioeconomic inequalities in health: how to apply commonly used quantitative analytical methods?

PubMed

Hu, Yannan; van Lenthe, Frank J; Hoffmann, Rasmus; van Hedel, Karen; Mackenbach, Johan P

2017-04-20

The scientific evidence-base for policies to tackle health inequalities is limited. Natural policy experiments (NPE) have drawn increasing attention as a means to evaluating the effects of policies on health. Several analytical methods can be used to evaluate the outcomes of NPEs in terms of average population health, but it is unclear whether they can also be used to assess the outcomes of NPEs in terms of health inequalities. The aim of this study therefore was to assess whether, and to demonstrate how, a number of commonly used analytical methods for the evaluation of NPEs can be applied to quantify the effect of policies on health inequalities. We identified seven quantitative analytical methods for the evaluation of NPEs: regression adjustment, propensity score matching, difference-in-differences analysis, fixed effects analysis, instrumental variable analysis, regression discontinuity and interrupted time-series. We assessed whether these methods can be used to quantify the effect of policies on the magnitude of health inequalities either by conducting a stratified analysis or by including an interaction term, and illustrated both approaches in a fictitious numerical example. All seven methods can be used to quantify the equity impact of policies on absolute and relative inequalities in health by conducting an analysis stratified by socioeconomic position, and all but one (propensity score matching) can be used to quantify equity impacts by inclusion of an interaction term between socioeconomic position and policy exposure. Methods commonly used in economics and econometrics for the evaluation of NPEs can also be applied to assess the equity impact of policies, and our illustrations provide guidance on how to do this appropriately. The low external validity of results from instrumental variable analysis and regression discontinuity makes these methods less desirable for assessing policy effects on population-level health inequalities. Increased use of the methods in social epidemiology will help to build an evidence base to support policy making in the area of health inequalities.
Validation of statistical predictive models meant to select melanoma patients for sentinel lymph node biopsy.

PubMed

Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G

2012-01-01

To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Validation of Statistical Predictive Models Meant to Select Melanoma Patients for Sentinel Lymph Node Biopsy

PubMed Central

Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.

2013-01-01

Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
Hybrid approach of selecting hyperparameters of support vector machine for regression.

PubMed

Jeng, Jin-Tsong

2006-06-01

To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
Sensitivity analysis, calibration, and testing of a distributed hydrological model using error‐based weighting and one objective function

USGS Publications Warehouse

Foglia, L.; Hill, Mary C.; Mehl, Steffen W.; Burlando, P.

2009-01-01

We evaluate the utility of three interrelated means of using data to calibrate the fully distributed rainfall‐runoff model TOPKAPI as applied to the Maggia Valley drainage area in Switzerland. The use of error‐based weighting of observation and prior information data, local sensitivity analysis, and single‐objective function nonlinear regression provides quantitative evaluation of sensitivity of the 35 model parameters to the data, identification of data types most important to the calibration, and identification of correlations among parameters that contribute to nonuniqueness. Sensitivity analysis required only 71 model runs, and regression required about 50 model runs. The approach presented appears to be ideal for evaluation of models with long run times or as a preliminary step to more computationally demanding methods. The statistics used include composite scaled sensitivities, parameter correlation coefficients, leverage, Cook's D, and DFBETAS. Tests suggest predictive ability of the calibrated model typical of hydrologic models.
Resection of complex pancreatic injuries: Benchmarking postoperative complications using the Accordion classification

PubMed Central

Krige, Jake E; Jonas, Eduard; Thomson, Sandie R; Kotze, Urda K; Setshedi, Mashiko; Navsaria, Pradeep H; Nicol, Andrew J

2017-01-01

AIM To benchmark severity of complications using the Accordion Severity Grading System (ASGS) in patients undergoing operation for severe pancreatic injuries. METHODS A prospective institutional database of 461 patients with pancreatic injuries treated from 1990 to 2015 was reviewed. One hundred and thirty patients with AAST grade 3, 4 or 5 pancreatic injuries underwent resection (pancreatoduodenectomy, n = 20, distal pancreatectomy, n = 110), including 30 who had an initial damage control laparotomy (DCL) and later definitive surgery. AAST injury grades, type of pancreatic resection, need for DCL and incidence and ASGS severity of complications were assessed. Uni- and multivariate logistic regression analysis was applied. RESULTS Overall 238 complications occurred in 95 (73%) patients of which 73% were ASGS grades 3-6. Nineteen patients (14.6%) died. Patients more likely to have complications after pancreatic resection were older, had a revised trauma score (RTS) < 7.8, were shocked on admission, had grade 5 injuries of the head and neck of the pancreas with associated vascular and duodenal injuries, required a DCL, received a larger blood transfusion, had a pancreatoduodenectomy (PD) and repeat laparotomies. Applying univariate logistic regression analysis, mechanism of injury, RTS < 7.8, shock on admission, DCL, increasing AAST grade and type of pancreatic resection were significant variables for complications. Multivariate logistic regression analysis however showed that only age and type of pancreatic resection (PD) were significant. CONCLUSION This ASGS-based study benchmarked postoperative morbidity after pancreatic resection for trauma. The detailed outcome analysis provided may serve as a reference for future institutional comparisons. PMID:28396721
Learning Inverse Rig Mappings by Nonlinear Regression.

PubMed

Holden, Daniel; Saito, Jun; Komura, Taku

2017-03-01

We present a framework to design inverse rig-functions-functions that map low level representations of a character's pose such as joint positions or surface geometry to the representation used by animators called the animation rig. Animators design scenes using an animation rig, a framework widely adopted in animation production which allows animators to design character poses and geometry via intuitive parameters and interfaces. Yet most state-of-the-art computer animation techniques control characters through raw, low level representations such as joint angles, joint positions, or vertex coordinates. This difference often stops the adoption of state-of-the-art techniques in animation production. Our framework solves this issue by learning a mapping between the low level representations of the pose and the animation rig. We use nonlinear regression techniques, learning from example animation sequences designed by the animators. When new motions are provided in the skeleton space, the learned mapping is used to estimate the rig controls that reproduce such a motion. We introduce two nonlinear functions for producing such a mapping: Gaussian process regression and feedforward neural networks. The appropriate solution depends on the nature of the rig and the amount of data available for training. We show our framework applied to various examples including articulated biped characters, quadruped characters, facial animation rigs, and deformable characters. With our system, animators have the freedom to apply any motion synthesis algorithm to arbitrary rigging and animation pipelines for immediate editing. This greatly improves the productivity of 3D animation, while retaining the flexibility and creativity of artistic input.
Improving power and robustness for detecting genetic association with extreme-value sampling design.

PubMed

Chen, Hua Yun; Li, Mingyao

2011-12-01

Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.
Magnitude and frequency of floods in small drainage basins in Idaho

USGS Publications Warehouse

Thomas, C.A.; Harenberg, W.A.; Anderson, J.M.

1973-01-01

A method is presented in this report for determining magnitude and frequency of floods on streams with drainage areas between 0.5 and 200 square miles. The method relates basin characteristics, including drainage area, percentage of forest cover, percentage of water area, latitude, and longitude, with peak flow characteristics. Regression equations for each of eight regions are presented for determination of QIQ/ the peak discharge, which, on the average, will be exceeded once in 10 years. Peak flows, Q25 and Q 50 , can then be estimated from Q25/Q10 and Q-50/Q-10 ratios developed for each region. Nomographs are included which solve the equations for basins between 1 and 50 square miles. The regional regression equations were developed using multiple regression techniques. Annual peaks for 303 sites were analyzed in the study. These included all records on unregulated streams with drainage areas less than about 500 square miles with 10 years or more of record or which could readily be extended to 10 years on the basis of nearby streams. The log-Pearson Type III method as modified and a digital computer were employed to estimate magnitude and frequency of floods for each of the 303 gaged sites. A large number of physical and climatic basin characteristics were determined for each of the gaged sites. The multiple regression method was then applied to determine the equations relating the floodflows and the most significant basin characteristics. For convenience of the users, several equations were simplified and some complex characteristics were deleted at the sacrifice of some increase in the standard error. Standard errors of estimate and many other statistical data were computed in the analysis process and are available in the Boise district office files. The analysis showed that QIQ was the best defined and most practical index flood for determination of the Q25 and 0,50 flood estimates.Regression equations are not developed because of poor definition for areas which total about 20,000 square miles, most of which are in southern Idaho. These areas are described in the report to prevent use of regression equations where they do not apply. They include urbanized areas, streams affected by regulation or diversion by works of man, unforested areas, streams with gaining or losing reaches, streams draining alluvial valleys and the Snake Plain, intense thunderstorm areas, and scattered areas where records indicate recurring floods which depart from the regional equations. Maximum flows of record and basin locations are summarized in tables and maps. The analysis indicates deficiencies in data exist. To improve knowledge regarding flood characteristics in poorly defined areas, the following data-collection programs are recommended. Gages should be operated on a few selected small streams for an extended period to define floods at long recurrence intervals. Crest-stage gages should be operated in representative basins in urbanized areas, newly developed irrigated areas and grasslands, and in unforested areas. Unusual floods should continue to be measured at miscellaneous sites on regulated streams and in intense thunderstorm-prone areas. The relationship between channel geometry and floodflow characteristics should be investigated as an alternative or supplement to operation of gaging stations. Documentation of historic flood data from newspapers and other sources would improve the basic flood-data base.
PM10 source apportionment in Milan (Italy) using time-resolved data.

PubMed

Bernardoni, Vera; Vecchi, Roberta; Valli, Gianluigi; Piazzalunga, Andrea; Fermo, Paola

2011-10-15

In this work Positive Matrix Factorization (PMF) was applied to 4-hour resolved PM10 data collected in Milan (Italy) during summer and winter 2006. PM10 characterisation included elements (Mg-Pb), main inorganic ions (NH(4)(+), NO(3)(-), SO(4)(2-)), levoglucosan and its isomers (mannosan and galactosan), and organic and elemental carbon (OC and EC). PMF resolved seven factors that were assigned to construction works, re-suspended dust, secondary sulphate, traffic, industry, secondary nitrate, and wood burning. Multi Linear Regression was applied to obtain the PM10 source apportionment. The 4-hour temporal resolution allowed the estimation of the factor contributions during peculiar episodes, which would have not been detected with the traditional 24-hour sampling strategy. Copyright © 2011 Elsevier B.V. All rights reserved.
The Andrews’ Principles of Risk, Need, and Responsivity as Applied in Drug Abuse Treatment Programs: Meta-Analysis of Crime and Drug Use Outcomes

PubMed Central

Prendergast, Michael L.; Pearson, Frank S.; Podus, Deborah; Hamilton, Zachary K.; Greenwell, Lisa

2013-01-01

Objectives The purpose of the present meta-analysis was to answer the question: Can the Andrews principles of risk, needs, and responsivity, originally developed for programs that treat offenders, be extended to programs that treat drug abusers? Methods Drawing from a dataset that included 243 independent comparisons, we conducted random-effects meta-regression and ANOVA-analog meta-analyses to test the Andrews principles by averaging crime and drug use outcomes over a diverse set of programs for drug abuse problems. Results For crime outcomes, in the meta-regressions the point estimates for each of the principles were substantial, consistent with previous studies of the Andrews principles. There was also a substantial point estimate for programs exhibiting a greater number of the principles. However, almost all of the 95% confidence intervals included the zero point. For drug use outcomes, in the meta-regressions the point estimates for each of the principles was approximately zero; however, the point estimate for programs exhibiting a greater number of the principles was somewhat positive. All of the estimates for the drug use principles had confidence intervals that included the zero point. Conclusions This study supports previous findings from primary research studies targeting the Andrews principles that those principles are effective in reducing crime outcomes, here in meta-analytic research focused on drug treatment programs. By contrast, programs that follow the principles appear to have very little effect on drug use outcomes. Primary research studies that experimentally test the Andrews principles in drug treatment programs are recommended. PMID:24058325
Method of and apparatus for generating an interstitial point in a data stream having an even number of data points

NASA Technical Reports Server (NTRS)

Edwards, T. R. (Inventor)

1985-01-01

Apparatus for doubling the data density rate of an analog to digital converter or doubling the data density storage capacity of a memory deviced is discussed. An interstitial data point midway between adjacent data points in a data stream having an even number of equal interval data points is generated by applying a set of predetermined one-dimensional convolute integer coefficients which can include a set of multiplier coefficients and a normalizer coefficient. Interpolator means apply the coefficients to the data points by weighting equally on each side of the center of the even number of equal interval data points to obtain an interstital point value at the center of the data points. A one-dimensional output data set, which is twice as dense as a one-dimensional equal interval input data set, can be generated where the output data set includes interstitial points interdigitated between adjacent data points in the input data set. The method for generating the set of interstital points is a weighted, nearest-neighbor, non-recursive, moving, smoothing averaging technique, equivalent to applying a polynomial regression calculation to the data set.
Application of Partial Least Square (PLS) Regression to Determine Landscape-Scale Aquatic Resources Vulnerability in the Ozark Mountains

EPA Science Inventory

Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology, particularly for determining the associations among multiple constituents of surface water and landscape configuration. Common dat...
Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

PubMed

Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

2016-08-01

The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
Quantile Regression in the Study of Developmental Sciences

PubMed Central

Petscher, Yaacov; Logan, Jessica A. R.

2014-01-01

Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

PubMed Central

Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi

2013-01-01

Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382
Modeling the human development index and the percentage of poor people using quantile smoothing splines

NASA Astrophysics Data System (ADS)

Mulyani, Sri; Andriyana, Yudhie; Sudartianto

2017-03-01

Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.
Two Paradoxes in Linear Regression Analysis.

PubMed

Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

2016-12-25

Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination

NASA Technical Reports Server (NTRS)

Rosipal, Roman; Clancy, Daniel (Technical Monitor)

2002-01-01

This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
The contextual effects of social capital on health: a cross-national instrumental variable analysis.

PubMed

Kim, Daniel; Baum, Christopher F; Ganz, Michael L; Subramanian, S V; Kawachi, Ichiro

2011-12-01

Past research on the associations between area-level/contextual social capital and health has produced conflicting evidence. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167,344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in both sexes when country population density and corruption were used as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Previous findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within and across countries may be large. Copyright © 2011 Elsevier Ltd. All rights reserved.

The contextual effects of social capital on health: a cross-national instrumental variable analysis

PubMed Central

Kim, Daniel; Baum, Christopher F; Ganz, Michael; Subramanian, S V; Kawachi, Ichiro

2011-01-01

Past observational studies of the associations of area-level/contextual social capital with health have revealed conflicting findings. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167 344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in women and men using country population density and corruption as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Past findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within countries may be large. PMID:22078106
WASP (Write a Scientific Paper) using Excel - 13: Correlation and Regression.

PubMed

Grech, Victor

2018-07-01

Correlation and regression measure the closeness of association between two continuous variables. This paper explains how to perform these tests in Microsoft Excel and their interpretation, as well as how to apply these tests dynamically using Excel's functions. Copyright © 2018 Elsevier B.V. All rights reserved.
Predictive factors of early moderate/severe ovarian hyperstimulation syndrome in non-polycystic ovarian syndrome patients: a statistical model.

PubMed

Ashrafi, Mahnaz; Bahmanabadi, Akram; Akhond, Mohammad Reza; Arabipoor, Arezoo

2015-11-01

To evaluate demographic, medical history and clinical cycle characteristics of infertile non-polycystic ovary syndrome (NPCOS) women with the purpose of investigating their associations with the prevalence of moderate-to-severe OHSS. In this retrospective study, among 7073 in vitro fertilization and/or intracytoplasmic sperm injection (IVF/ICSI) cycles, 86 cases of NPCO patients who developed moderate-to-severe OHSS while being treated with IVF/ICSI cycles were analyzed during the period of January 2008 to December 2010 at Royan Institute. To review the OHSS risk factors, 172 NPCOS patients without developing OHSS, treated at the same period of time, were selected randomly by computer as control group. We used multiple logistic regression in a backward manner to build a prediction model. The regression analysis revealed that the variables, including age [odds ratio (OR) 0.9, confidence interval (CI) 0.81-0.99], antral follicles count (OR 4.3, CI 2.7-6.9), infertility cause (tubal factor, OR 11.5, CI 1.1-51.3), hypothyroidism (OR 3.8, CI 1.5-9.4) and positive history of ovarian surgery (OR 0.2, CI 0.05-0.9) were the most important predictors of OHSS. The regression model had an area under curve of 0.94, presenting an allowable discriminative performance that was equal with two strong predictive variables, including the number of follicles and serum estradiol level on human chorionic gonadotropin day. The predictive regression model based on primary characteristics of NPCOS patients had equal specificity in comparison with two mentioned strong predictive variables. Therefore, it may be beneficial to apply this model before the beginning of ovarian stimulation protocol.
Estimation of standard liver volume in Chinese adult living donors.

PubMed

Fu-Gui, L; Lu-Nan, Y; Bo, L; Yong, Z; Tian-Fu, W; Ming-Qing, X; Wen-Tao, W; Zhe-Yu, C

2009-12-01

To determine a formula predicting the standard liver volume based on body surface area (BSA) or body weight in Chinese adults. A total of 115 consecutive right-lobe living donors not including the middle hepatic vein underwent right hemi-hepatectomy. No organs were used from prisoners, and no subjects were prisoners. Donor anthropometric data including age, gender, body weight, and body height were recorded prospectively. The weights and volumes of the right lobe liver grafts were measured at the back table. Liver weights and volumes were calculated from the right lobe graft weight and volume obtained at the back table, divided by the proportion of the right lobe on computed tomography. By simple linear regression analysis and stepwise multiple linear regression analysis, we correlated calculated liver volume and body height, body weight, or body surface area. The subjects had a mean age of 35.97 +/- 9.6 years, and a female-to-male ratio of 60:55. The mean volume of the right lobe was 727.47 +/- 136.17 mL, occupying 55.59% +/- 6.70% of the whole liver by computed tomography. The volume of the right lobe was 581.73 +/- 96.137 mL, and the estimated liver volume was 1053.08 +/- 167.56 mL. Females of the same body weight showed a slightly lower liver weight. By simple linear regression analysis and stepwise multiple linear regression analysis, a formula was derived based on body weight. All formulae except the Hong Kong formula overestimated liver volume compared to this formula. The formula of standard liver volume, SLV (mL) = 11.508 x body weight (kg) + 334.024, may be applied to estimate liver volumes in Chinese adults.
Analytical learning and term-rewriting systems

NASA Technical Reports Server (NTRS)

Laird, Philip; Gamble, Evan

1990-01-01

Analytical learning is a set of machine learning techniques for revising the representation of a theory based on a small set of examples of that theory. When the representation of the theory is correct and complete but perhaps inefficient, an important objective of such analysis is to improve the computational efficiency of the representation. Several algorithms with this purpose have been suggested, most of which are closely tied to a first order logical language and are variants of goal regression, such as the familiar explanation based generalization (EBG) procedure. But because predicate calculus is a poor representation for some domains, these learning algorithms are extended to apply to other computational models. It is shown that the goal regression technique applies to a large family of programming languages, all based on a kind of term rewriting system. Included in this family are three language families of importance to artificial intelligence: logic programming, such as Prolog; lambda calculus, such as LISP; and combinatorial based languages, such as FP. A new analytical learning algorithm, AL-2, is exhibited that learns from success but is otherwise quite different from EBG. These results suggest that term rewriting systems are a good framework for analytical learning research in general, and that further research should be directed toward developing new techniques.
Non-Intrusive Measurement Techniques Applied to the Hybrid Solid Fuel Degradation

NASA Astrophysics Data System (ADS)

Cauty, F.

2004-10-01

The knowledge of the solid fuel regression rate and the time evolution of the grain geometry are requested for hybrid motor design and control of its operating conditions. Two non-intrusive techniques (NDT) have been applied to hybrid propulsion : both are based on wave propagation, the X-rays and the ultrasounds, through the materials. X-ray techniques allow local thickness measurements (attenuated signal level) using small probes or 2D images (Real Time Radiography), with a link between the size of field of view and accuracy. Beside the safety hazards associated with the high-intensity X-ray systems, the image analysis requires the use of quite complex post-processing techniques. The ultrasound technique is more widely used in energetic material applications, including hybrid fuels. Depending upon the transducer size and the associated equipment, the application domain is large, from tiny samples to the quad-port wagon wheel grain of the 1.1 MN thrust HPDP motor. The effect of the physical quantities has to be taken into account in the wave propagation analysis. With respect to the various applications, there is no unique and perfect experimental method to measure the fuel regression rate. The best solution could be obtained by combining two techniques at the same time, each technique enhancing the quality of the global data.
Assessing Local Model Adequacy in Bayesian Hierarchical Models Using the Partitioned Deviance Information Criterion

PubMed Central

Wheeler, David C.; Hickson, DeMarc A.; Waller, Lance A.

2010-01-01

Many diagnostic tools and goodness-of-fit measures, such as the Akaike information criterion (AIC) and the Bayesian deviance information criterion (DIC), are available to evaluate the overall adequacy of linear regression models. In addition, visually assessing adequacy in models has become an essential part of any regression analysis. In this paper, we focus on a spatial consideration of the local DIC measure for model selection and goodness-of-fit evaluation. We use a partitioning of the DIC into the local DIC, leverage, and deviance residuals to assess local model fit and influence for both individual observations and groups of observations in a Bayesian framework. We use visualization of the local DIC and differences in local DIC between models to assist in model selection and to visualize the global and local impacts of adding covariates or model parameters. We demonstrate the utility of the local DIC in assessing model adequacy using HIV prevalence data from pregnant women in the Butare province of Rwanda during 1989-1993 using a range of linear model specifications, from global effects only to spatially varying coefficient models, and a set of covariates related to sexual behavior. Results of applying the diagnostic visualization approach include more refined model selection and greater understanding of the models as applied to the data. PMID:21243121
Has there been a change in the knowledge of GP registrars between 2011 and 2016 as measured by performance on common items in the Applied Knowledge Test?

PubMed

Neden, Catherine A; Parkin, Claire; Blow, Carol; Siriwardena, Aloysius Niroshan

2018-05-08

The aim of this study was to assess whether the absolute standard of candidates sitting the MRCGP Applied Knowledge Test (AKT) between 2011 and 2016 had changed. It is a descriptive study comparing the performance on marker questions of a reference group of UK graduates taking the AKT for the first time between 2011 and 2016. Using aggregated examination data, the performance of individual 'marker' questions was compared using Pearson's chi-squared tests and trend-line analysis. Binary logistic regression was used to analyse changes in performance over the study period. Changes in performance of individual marker questions using Pearson's chi-squared test showed statistically significant differences in 32 of the 49 questions included in the study. Trend line analysis showed a positive trend in 29 questions and a negative trend in the remaining 23. The magnitude of change was small. Logistic regression did not demonstrate any evidence for a change in the performance of the question set over the study period. However, candidates were more likely to get items on administration wrong compared with clinical medicine or research. There was no evidence of a change in performance of the question set as a whole.
Round Robin evaluation of soil moisture retrieval models for the MetOp-A ASCAT Instrument

NASA Astrophysics Data System (ADS)

Gruber, Alexander; Paloscia, Simonetta; Santi, Emanuele; Notarnicola, Claudia; Pasolli, Luca; Smolander, Tuomo; Pulliainen, Jouni; Mittelbach, Heidi; Dorigo, Wouter; Wagner, Wolfgang

2014-05-01

Global soil moisture observations are crucial to understand hydrologic processes, earth-atmosphere interactions and climate variability. ESA's Climate Change Initiative (CCI) project aims to create a global consistent long-term soil moisture data set based on the merging of the best available active and passive satellite-based microwave sensors and retrieval algorithms. Within the CCI, a Round Robin evaluation of existing retrieval algorithms for both active and passive instruments was carried out. In this study we present the comparison of five different retrieval algorithms covering three different modelling principles applied to active MetOp-A ASCAT L1 backscatter data. These models include statistical models (Bayesian Regression and Support Vector Regression, provided by the Institute for Applied Remote Sensing, Eurac Research Viale Druso, Italy, and an Artificial Neural Network, provided by the Institute of Applied Physics, CNR-IFAC, Italy), a semi-empirical model (provided by the Finnish Meteorological Institute), and a change detection model (provided by the Vienna University of Technology). The algorithms were applied on L1 backscatter data within the period of 2007-2011, resampled to a 12.5 km grid. The evaluation was performed over 75 globally distributed, quality controlled in situ stations drawn from the International Soil Moisture Network (ISMN) using surface soil moisture data from the Global Land Data Assimilation System (GLDAS-) Noah land surface model as second independent reference. The temporal correlation between the data sets was analyzed and random errors of the the different algorithms were estimated using the triple collocation method. Absolute soil moisture values as well as soil moisture anomalies were considered including both long-term anomalies from the mean seasonal cycle and short-term anomalies from a five weeks moving average window. Results show a very high agreement between all five algorithms for most stations. A slight vegetation dependency of the errors and a spatial decorrelation of the performance patterns of the different algorithms was found. We conclude that future research should focus on understanding, combining and exploiting the advantages of all available modelling approaches rather than trying to optimize one approach to fit every possible condition.
Geodesic least squares regression on information manifolds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be

We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less
INNOVATIVE INSTRUMENTATION AND ANALYSIS OF THE TEMPERATURE MEASUREMENT FOR HIGH TEMPERATURE GASIFICATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seong W. Lee

During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less
Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

PubMed

Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

2017-02-01

At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Section 3. The SPARROW Surface Water-Quality Model: Theory, Application and User Documentation

USGS Publications Warehouse

Schwarz, G.E.; Hoos, A.B.; Alexander, R.B.; Smith, R.A.

2006-01-01

SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling technique for relating water-quality measurements made at a network of monitoring stations to attributes of the watersheds containing the stations. The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and diffuse sources on land to rivers and through the stream and river network. The model predicts contaminant flux, concentration, and yield in streams and has been used to evaluate alternative hypotheses about the important contaminant sources and watershed properties that control transport over large spatial scales. This report provides documentation for the SPARROW modeling technique and computer software to guide users in constructing and applying basic SPARROW models. The documentation gives details of the SPARROW software, including the input data and installation requirements, and guidance in the specification, calibration, and application of basic SPARROW models, as well as descriptions of the model output and its interpretation. The documentation is intended for both researchers and water-resource managers with interest in using the results of existing models and developing and applying new SPARROW models. The documentation of the model is presented in two parts. Part 1 provides a theoretical and practical introduction to SPARROW modeling techniques, which includes a discussion of the objectives, conceptual attributes, and model infrastructure of SPARROW. Part 1 also includes background on the commonly used model specifications and the methods for estimating and evaluating parameters, evaluating model fit, and generating water-quality predictions and measures of uncertainty. Part 2 provides a user's guide to SPARROW, which includes a discussion of the software architecture and details of the model input requirements and output files, graphs, and maps. The text documentation and computer software are available on the Web at http://usgs.er.gov/sparrow/sparrow-mod/.
Two Paradoxes in Linear Regression Analysis

PubMed Central

FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

2016-01-01

Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

NASA Astrophysics Data System (ADS)

Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

2018-01-01

In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
Using soft computing techniques to predict corrected air permeability using Thomeer parameters, air porosity and grain density

NASA Astrophysics Data System (ADS)

Nooruddin, Hasan A.; Anifowose, Fatai; Abdulraheem, Abdulazeez

2014-03-01

Soft computing techniques are recently becoming very popular in the oil industry. A number of computational intelligence-based predictive methods have been widely applied in the industry with high prediction capabilities. Some of the popular methods include feed-forward neural networks, radial basis function network, generalized regression neural network, functional networks, support vector regression and adaptive network fuzzy inference system. A comparative study among most popular soft computing techniques is presented using a large dataset published in literature describing multimodal pore systems in the Arab D formation. The inputs to the models are air porosity, grain density, and Thomeer parameters obtained using mercury injection capillary pressure profiles. Corrected air permeability is the target variable. Applying developed permeability models in recent reservoir characterization workflow ensures consistency between micro and macro scale information represented mainly by Thomeer parameters and absolute permeability. The dataset was divided into two parts with 80% of data used for training and 20% for testing. The target permeability variable was transformed to the logarithmic scale as a pre-processing step and to show better correlations with the input variables. Statistical and graphical analysis of the results including permeability cross-plots and detailed error measures were created. In general, the comparative study showed very close results among the developed models. The feed-forward neural network permeability model showed the lowest average relative error, average absolute relative error, standard deviations of error and root means squares making it the best model for such problems. Adaptive network fuzzy inference system also showed very good results.
Clinical responses to focused ultrasound applied to women with vulval intraepithelial neoplasia.

PubMed

Jia, Ying; Wu, Jin; Xu, Man; Tang, Liangdan; Li, Chengzhi; Luo, Ming; Lou, Meng

2014-11-01

Focused ultrasound waves penetrate superficial tissues and are aimed toward the target tissues at specific depths to exert their biological effects. Focused ultrasound has been applied for a number of clinical indications, including vulval dystrophies and low-grade vulval disease. This study aimed to assess the efficacy and safety of focused ultrasound treatment of high-grade vulval intraepithelial neoplasia (VIN). Eighteen women with high-grade VIN were recruited and treated with focused ultrasound. During each posttreatment follow-up, the safety of, side effects of, and clinical responses to focused ultrasound were evaluated by a standardized protocol, including symptoms, clinical appearance, and histologic findings. All patients completed the designed follow-ups. In most cases, superficial mild to moderate swelling and blisters were seen in the focused ultrasound-treated skin but not in adjacent normal skin. Of the 18 patients, 16 showed complete histologic regression and resolution of symptoms 6 months after treatment. Of the other 2 patients, 1 showed complete regression after a second focused ultrasound treatment. The other patient did not respond to the focused ultrasound treatment and underwent a partial vulvectomy 6 months after treatment. None of the patients developed invasive carcinoma of the vulva during the follow-up period. One patient had local pruritus that was not alleviated by anti-inflammatory medication and local care. The complete responses observed in women with high-grade VIN treated by focused ultrasound, together with the preservation of adjacent normal tissue, suggest that focused ultrasound may be considered for treatment of high-grade VIN. © 2014 by the American Institute of Ultrasound in Medicine.
Serum Folate Shows an Inverse Association with Blood Pressure in a Cohort of Chinese Women of Childbearing Age: A Cross-Sectional Study

PubMed Central

Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu

2016-01-01

Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603
Generating patient specific pseudo-CT of the head from MR using atlas-based regression

NASA Astrophysics Data System (ADS)

Sjölund, J.; Forsberg, D.; Andersson, M.; Knutsson, H.

2015-01-01

Radiotherapy planning and attenuation correction of PET images require simulation of radiation transport. The necessary physical properties are typically derived from computed tomography (CT) images, but in some cases, including stereotactic neurosurgery and combined PET/MR imaging, only magnetic resonance (MR) images are available. With these applications in mind, we describe how a realistic, patient-specific, pseudo-CT of the head can be derived from anatomical MR images. We refer to the method as atlas-based regression, because of its similarity to atlas-based segmentation. Given a target MR and an atlas database comprising MR and CT pairs, atlas-based regression works by registering each atlas MR to the target MR, applying the resulting displacement fields to the corresponding atlas CTs and, finally, fusing the deformed atlas CTs into a single pseudo-CT. We use a deformable registration algorithm known as the Morphon and augment it with a certainty mask that allows a tailoring of the influence certain regions are allowed to have on the registration. Moreover, we propose a novel method of fusion, wherein the collection of deformed CTs is iteratively registered to their joint mean and find that the resulting mean CT becomes more similar to the target CT. However, the voxelwise median provided even better results; at least as good as earlier work that required special MR imaging techniques. This makes atlas-based regression a good candidate for clinical use.
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.

PubMed

Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin

2014-03-01

Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.

Addressing data privacy in matched studies via virtual pooling.

PubMed

Saha-Chaudhuri, P; Weinberg, C R

2017-09-07

Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
Spectral Regression Based Fault Feature Extraction for Bearing Accelerometer Sensor Signals

PubMed Central

Xia, Zhanguo; Xia, Shixiong; Wan, Ling; Cai, Shiyu

2012-01-01

Bearings are not only the most important element but also a common source of failures in rotary machinery. Bearing fault prognosis technology has been receiving more and more attention recently, in particular because it plays an increasingly important role in avoiding the occurrence of accidents. Therein, fault feature extraction (FFE) of bearing accelerometer sensor signals is essential to highlight representative features of bearing conditions for machinery fault diagnosis and prognosis. This paper proposes a spectral regression (SR)-based approach for fault feature extraction from original features including time, frequency and time-frequency domain features of bearing accelerometer sensor signals. SR is a novel regression framework for efficient regularized subspace learning and feature extraction technology, and it uses the least squares method to obtain the best projection direction, rather than computing the density matrix of features, so it also has the advantage in dimensionality reduction. The effectiveness of the SR-based method is validated experimentally by applying the acquired vibration signals data to bearings. The experimental results indicate that SR can reduce the computation cost and preserve more structure information about different bearing faults and severities, and it is demonstrated that the proposed feature extraction scheme has an advantage over other similar approaches. PMID:23202017
Measurement error in epidemiologic studies of air pollution based on land-use regression models.

PubMed

Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino

2013-10-15

Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.
Adjusted regression trend test for a multicenter clinical trial.

PubMed

Quan, H; Capizzi, T

1999-06-01

Studies using a series of increasing doses of a compound, including a zero dose control, are often conducted to study the effect of the compound on the response of interest. For a one-way design, Tukey et al. (1985, Biometrics 41, 295-301) suggested assessing trend by examining the slopes of regression lines under arithmetic, ordinal, and arithmetic-logarithmic dose scalings. They reported the smallest p-value for the three significance tests on the three slopes for safety assessments. Capizzi et al. (1992, Biometrical Journal 34, 275-289) suggested an adjusted trend test, which adjusts the p-value using a trivariate t-distribution, the joint distribution of the three slope estimators. In this paper, we propose an adjusted regression trend test suitable for two-way designs, particularly for multicenter clinical trials. In a step-down fashion, the proposed trend test can be applied to a multicenter clinical trial to compare each dose with the control. This sequential procedure is a closed testing procedure for a trend alternative. Therefore, it adjusts p-values and maintains experimentwise error rate. Simulation results show that the step-down trend test is overall more powerful than a step-down least significant difference test.
Quantification of endocrine disruptors and pesticides in water by gas chromatography-tandem mass spectrometry. Method validation using weighted linear regression schemes.

PubMed

Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P

2010-10-22

A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.
REASONS AND CONSEQUENCES OF APPLIED LEADERSHIP STYLES IN ETHICAL DILEMMAS WHEN NURSE MANAGERS MAKE DECISIONS.

PubMed

Zydziunaite, V; Suominen, T

2014-09-21

Abstract Background: Understanding the reasons and consequences of leadership styles in ethical dilemmas is fundamental to exploring nurse managers' abilities to influence outcomes for patients and nursing personnel. Purpose: To explain the associations between different leadership styles, reasons for their application and its consequences when nurse managers make decisions in ethical dilemmas. Methods: The data were collected between 15 October 2011 and 30 April 2012 by statistically validated questionnaire. The respondents (n=278) were nurse managers. The data were analyzed using SPSS 20.0, calculating Spearman's correlations, the Stepwise Regression and ANOVA. Results: The reasons for applying different leadership styles in ethical dilemmas include personal characteristics, years in work position, institutional factors, and the professional authority of nurse managers. The applied leadership styles in ethical dilemmas are associated with the consequences regarding the satisfaction of patients', relatives' and nurse managers' needs. Conclusions: Nurse managers exhibited leadership styles oriented to maintenance, focusing more on the "doing the job" than on managing the decision-making in ethical dilemmas.
Does Bootstrap Procedure Provide Biased Estimates? An Empirical Examination for a Case of Multiple Regression.

ERIC Educational Resources Information Center

Fan, Xitao

This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample…
Regression Discontinuity and Beyond: Options for Studying External Validity in an Internally Valid Design

ERIC Educational Resources Information Center

Wing, Coady; Bello-Gomez, Ricardo A.

2018-01-01

Treatment effect estimates from a "regression discontinuity design" (RDD) have high internal validity. However, the arguments that support the design apply to a subpopulation that is narrower and usually different from the population of substantive interest in evaluation research. The disconnect between RDD population and the…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

ERIC Educational Resources Information Center

Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

2018-01-01

The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Predictors of Placement Stability at the State Level: The Use of Logistic Regression to Inform Practice

ERIC Educational Resources Information Center

Courtney, Jon R.; Prophet, Retta

2011-01-01

Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?

ERIC Educational Resources Information Center

Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.

2009-01-01

The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Complex regression Doppler optical coherence tomography

NASA Astrophysics Data System (ADS)

Elahi, Sahar; Gu, Shi; Thrane, Lars; Rollins, Andrew M.; Jenkins, Michael W.

2018-04-01

We introduce a new method to measure Doppler shifts more accurately and extend the dynamic range of Doppler optical coherence tomography (OCT). The two-point estimate of the conventional Doppler method is replaced with a regression that is applied to high-density B-scans in polar coordinates. We built a high-speed OCT system using a 1.68-MHz Fourier domain mode locked laser to acquire high-density B-scans (16,000 A-lines) at high enough frame rates (˜100 fps) to accurately capture the dynamics of the beating embryonic heart. Flow phantom experiments confirm that the complex regression lowers the minimum detectable velocity from 12.25 mm / s to 374 μm / s, whereas the maximum velocity of 400 mm / s is measured without phase wrapping. Complex regression Doppler OCT also demonstrates higher accuracy and precision compared with the conventional method, particularly when signal-to-noise ratio is low. The extended dynamic range allows monitoring of blood flow over several stages of development in embryos without adjusting the imaging parameters. In addition, applying complex averaging recovers hidden features in structural images.
Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

PubMed Central

Stamm, John W.; Long, D. Leann; Kincade, Megan E.

2012-01-01

Over the past five to ten years, zero-inflated count regression models have been increasingly applied to the analysis of dental caries indices (e.g., DMFT, dfms, etc). The main reason for that is linked to the broad decline in children’s caries experience, such that dmf and DMF indices more frequently generate low or even zero counts. This article specifically reviews the application of zero-inflated Poisson and zero-inflated negative binomial regression models to dental caries, with emphasis on the description of the models and the interpretation of fitted model results given the study goals. The review finds that interpretations provided in the published caries research are often imprecise or inadvertently misleading, particularly with respect to failing to discriminate between inference for the class of susceptible persons defined by such models and inference for the sampled population in terms of overall exposure effects. Recommendations are provided to enhance the use as well as the interpretation and reporting of results of count regression models when applied to epidemiological studies of dental caries. PMID:22710271
Regional interpretation of water-quality monitoring data

USGS Publications Warehouse

Smith, Richard A.; Schwarz, Gregory E.; Alexander, Richard B.

1997-01-01

We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg/L, and to classify cataloging units according to local TN yield (kg/km2/yr).
[CHARACTERIZATION OF VESTIBULAR DISORDERS IN THE INJURED PERSONS WITH THE BRAIN CONCUSSION IN ACUTE PERIOD].

PubMed

Skobska, O E; Kadzhaya, N V; Andreyev, O A; Potapov, E V

2015-04-01

There were examined 32 injured persons, ageing (34.1 ± 1.3) yrs at average, for the brain commotion (BC). The adopted protocol SCAT-3 (Standardized Concussion Assessment Tool, 3rd ed.), DHI (Dizziness Handicap Inventory questionnaire), computer stabilography (KS) were applied for the vestibular disorders diagnosis. There was established, that in acute period of BC a dyssociation between regression of objective neurological symptoms and permanence of the BC indices occurs, what confirms a latent disorder of the balance function. Changes of basic indices of statokinesiography, including increase of the vibration amplitude enhancement in general centre of pressure in a saggital square and the BC square (235.3 ± 13.7) mm2 in a modified functional test of Romberg with the closed eyes is possible to apply as objective criteria for the BC diagnosis.
Comparative study between derivative spectrophotometry and multivariate calibration as analytical tools applied for the simultaneous quantitation of Amlodipine, Valsartan and Hydrochlorothiazide

NASA Astrophysics Data System (ADS)

Darwish, Hany W.; Hassan, Said A.; Salem, Maissa Y.; El-Zeany, Badr A.

2013-09-01

Four simple, accurate and specific methods were developed and validated for the simultaneous estimation of Amlodipine (AML), Valsartan (VAL) and Hydrochlorothiazide (HCT) in commercial tablets. The derivative spectrophotometric methods include Derivative Ratio Zero Crossing (DRZC) and Double Divisor Ratio Spectra-Derivative Spectrophotometry (DDRS-DS) methods, while the multivariate calibrations used are Principal Component Regression (PCR) and Partial Least Squares (PLSs). The proposed methods were applied successfully in the determination of the drugs in laboratory-prepared mixtures and in commercial pharmaceutical preparations. The validity of the proposed methods was assessed using the standard addition technique. The linearity of the proposed methods is investigated in the range of 2-32, 4-44 and 2-20 μg/mL for AML, VAL and HCT, respectively.
A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

PubMed

Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

2006-08-01

A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Regression of Moral Reasoning during Medical Education: Combined Design Study to Evaluate the Effect of Clinical Study Years

PubMed Central

Hren, Darko; Marušić, Matko; Marušić, Ana

2011-01-01

Background Moral reasoning is important for developing medical professionalism but current evidence for the relationship between education and moral reasoning does not clearly apply to medical students. We used a combined study design to test the effect of clinical teaching on moral reasoning. Methods We used the Defining Issues Test-2 as a measure of moral judgment, with 3 general moral schemas: Personal Interest, Maintaining Norms, and Postconventional Schema. The test was applied to 3 consecutive cohorts of second year students in 2002 (n = 207), 2003 (n = 192), and 2004 (n = 139), and to 707 students of all 6 study years in 2004 cross-sectional study. We also tested 298 age-matched controls without university education. Results In the cross-sectional study, there was significant main effect of the study year for Postconventional (F(5,679) = 3.67, P = 0.003) and Personal Interest scores (F(5,679) = 3.38, P = 0.005). There was no effect of the study year for Maintaining Norms scores. 3rd year medical students scored higher on Postconventional schema score than all other study years (p<0.001). There were no statistically significant differences among 3 cohorts of 2nd year medical students, demonstrating the absence of cohort or point-of-measurement effects. Longitudinal study of 3 cohorts demonstrated that students regressed from Postconventional to Maintaining Norms schema-based reasoning after entering the clinical part of the curriculum. Interpretation Our study demonstrated direct causative relationship between the regression in moral reasoning development and clinical teaching during medical curriculum. The reasons may include hierarchical organization of clinical practice, specific nature of moral dilemmas faced by medical students, and hidden medical curriculum. PMID:21479204
Pareto fronts for multiobjective optimization design on materials data

NASA Astrophysics Data System (ADS)

Gopakumar, Abhijith; Balachandran, Prasanna; Gubernatis, James E.; Lookman, Turab

Optimizing multiple properties simultaneously is vital in materials design. Here we apply infor- mation driven, statistical optimization strategies blended with machine learning methods, to address multi-objective optimization tasks on materials data. These strategies aim to find the Pareto front consisting of non-dominated data points from a set of candidate compounds with known character- istics. The objective is to find the pareto front in as few additional measurements or calculations as possible. We show how exploration of the data space to find the front is achieved by using uncer- tainties in predictions from regression models. We test our proposed design strategies on multiple, independent data sets including those from computations as well as experiments. These include data sets for Max phases, piezoelectrics and multicomponent alloys.
AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

PubMed

Yu, Wenbao; Park, Taesung

2014-01-01

It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

Evaluating the effect of a third-party implementation of resolution recovery on the quality of SPECT bone scan imaging using visual grading regression.

PubMed

Hay, Peter D; Smith, Julie; O'Connor, Richard A

2016-02-01

The aim of this study was to evaluate the benefits to SPECT bone scan image quality when applying resolution recovery (RR) during image reconstruction using software provided by a third-party supplier. Bone SPECT data from 90 clinical studies were reconstructed retrospectively using software supplied independent of the gamma camera manufacturer. The current clinical datasets contain 120×10 s projections and are reconstructed using an iterative method with a Butterworth postfilter. Five further reconstructions were created with the following characteristics: 10 s projections with a Butterworth postfilter (to assess intraobserver variation); 10 s projections with a Gaussian postfilter with and without RR; and 5 s projections with a Gaussian postfilter with and without RR. Two expert observers were asked to rate image quality on a five-point scale relative to our current clinical reconstruction. Datasets were anonymized and presented in random order. The benefits of RR on image scores were evaluated using ordinal logistic regression (visual grading regression). The application of RR during reconstruction increased the probability of both observers of scoring image quality as better than the current clinical reconstruction even where the dataset contained half the normal counts. Type of reconstruction and observer were both statistically significant variables in the ordinal logistic regression model. Visual grading regression was found to be a useful method for validating the local introduction of technological developments in nuclear medicine imaging. RR, as implemented by the independent software supplier, improved bone SPECT image quality when applied during image reconstruction. In the majority of clinical cases, acquisition times for bone SPECT intended for the purposes of localization can safely be halved (from 10 s projections to 5 s) when RR is applied.
Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model.

PubMed

Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan

2016-10-01

Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Caries risk assessment in schoolchildren - a form based on Cariogram® software

PubMed Central

CABRAL, Renata Nunes; HILGERT, Leandro Augusto; FABER, Jorge; LEAL, Soraya Coelho

2014-01-01

Identifying caries risk factors is an important measure which contributes to best understanding of the cariogenic profile of the patient. The Cariogram® software provides this analysis, and protocols simplifying the method were suggested. Objectives The aim of this study was to determine whether a newly developed Caries Risk Assessment (CRA) form based on the Cariogram® software could classify schoolchildren according to their caries risk and to evaluate relationships between caries risk and the variables in the form. Material and Methods 150 schoolchildren aged 5 to 7 years old were included in this survey. Caries prevalence was obtained according to International Caries Detection and Assessment System (ICDAS) II. Information for filling in the form based on Cariogram® was collected clinically and from questionnaires sent to parents. Linear regression and a forward stepwise multiple regression model were applied to correlate the variables included in the form with the caries risk. Results Caries prevalence, in primary dentition, including enamel and dentine carious lesions was 98.6%, and 77.3% when only dentine lesions were considered. Eighty-six percent of the children were classified as at moderate caries risk. The forward stepwise multiple regression model result was significant (R2=0.904; p<0.00001), showing that the most significant factors influencing caries risk were caries experience, oral hygiene, frequency of food consumption, sugar consumption and fluoride sources. Conclusion The use of the form based on the Cariogram® software enabled classification of the schoolchildren at low, moderate and high caries risk. Caries experience, oral hygiene, frequency of food consumption, sugar consumption and fluoride sources are the variables that were shown to be highly correlated with caries risk. PMID:25466473
The mycotic ulcer treatment trial: a randomized trial comparing natamycin vs voriconazole.

PubMed

Prajna, N Venkatesh; Krishnan, Tiruvengada; Mascarenhas, Jeena; Rajaraman, Revathi; Prajna, Lalitha; Srinivasan, Muthiah; Raghavan, Anita; Oldenburg, Catherine E; Ray, Kathryn J; Zegans, Michael E; McLeod, Stephen D; Porco, Travis C; Acharya, Nisha R; Lietman, Thomas M

2013-04-01

To compare topical natamycin vs voriconazole in the treatment of filamentous fungal keratitis. This phase 3, double-masked, multicenter trial was designed to randomize 368 patients to voriconazole (1%) or natamycin (5%), applied topically every hour while awake until reepithelialization, then 4 times daily for at least 3 weeks. Eligibility included smear-positive filamentous fungal ulcer and visual acuity of 20/40 to 20/400. The primary outcome was best spectacle-corrected visual acuity at 3 months; secondary outcomes included corneal perforation and/or therapeutic penetrating keratoplasty. A total of 940 patients were screened and 323 were enrolled. Causative organisms included Fusarium (128 patients [40%]), Aspergillus (54 patients [17%]), and other filamentous fungi (141 patients [43%]). Natamycintreated cases had significantly better 3-month best spectacle-corrected visual acuity than voriconazole-treated cases (regression coefficient=0.18 logMAR; 95% CI, 0.30 to 0.05; P=.006). Natamycin-treated cases were less likely to have perforation or require therapeutic penetrating keratoplasty (odds ratio=0.42; 95% CI, 0.22 to 0.80; P=.009). Fusarium cases fared better with natamycin than with voriconazole (regression coefficient=0.41 logMAR; 95% CI,0.61 to 0.20; P<.001; odds ratio for perforation=0.06; 95% CI, 0.01 to 0.28; P<.001), while non-Fusarium cases fared similarly (regression coefficient=0.02 logMAR; 95% CI, 0.17 to 0.13; P=.81; odds ratio for perforation=1.08; 95% CI, 0.48 to 2.43; P=.86). Natamycin treatment was associated with significantly better clinical and microbiological outcomes than voriconazole treatment for smear-positive filamentous fungal keratitis, with much of the difference attributable to improved results in Fusarium cases. Voriconazole should not be used as monotherapy in filamentous keratitis. clinicaltrials.gov Identifier: NCT00996736
Zinc supplementation for the prevention of acute lower respiratory infection in children in developing countries: meta-analysis and meta-regression of randomized trials.

PubMed

Roth, Daniel E; Richard, Stephanie A; Black, Robert E

2010-06-01

Routine zinc supplementation is a potential intervention for the prevention of acute lower respiratory infection (ALRI) in developing countries. However, discrepant findings from recent randomized trials remain unexplained. Randomized trials of zinc supplementation in young children in developing countries were identified by a systematic literature review. Trials included in the meta-analysis met specific criteria, including participants <5 years of age, daily/weekly zinc and control supplementation for greater than 3 months, active household surveillance for respiratory morbidity and use of a case definition that included at least one sign of lower respiratory tract illness. ALRI case definitions were classified on the basis of specificity/severity. Incidence rate ratios (IRRs) were pooled by random-effects models. Meta-regression and sub-group analysis were performed to assess potential sources of between-study heterogeneity. Ten trials were eligible for inclusion (n = 49 450 children randomized). Zinc reduced the incidence of ALRI defined by specific clinical criteria [IRR 0.65, 95% confidence interval (CI) 0.52-0.82], but had no effect on lower-specificity ALRI case definitions based on caregiver report (IRR 1.01, 95% CI 0.91-1.12) or World Health Organization 'non-severe pneumonia' (0.96, 95% CI 0.86-1.08). By meta-regression, the effect of zinc was associated with ALRI case definition, but not with mean baseline age, geographic location, nutritional status or zinc dose. Routine zinc supplementation reduced the incidence of childhood ALRI defined by relatively specific clinical criteria, but the effect was null if lower specificity case definitions were applied. The choice of ALRI case definition may substantially influence inferences from community trials regarding the efficacy of preventive interventions.
Determinant of securitization asset pricing in Malaysia

NASA Astrophysics Data System (ADS)

Bakri, M. H.; Ali, R.; Ismail, S.; Sufian, F.; Baharom, A. H.

2014-12-01

Malaysian firms have been reported involve in Asset Back Securities since 1986s where Cagamas is a pioneer. This research aims to examine the factor influencing primary market spread. Least square method and regression analysis are applied for the study period 2004-2012. The result shows one determinants in internal regression model and three determinants in external regression influence or contribute to the primary market spread and are statistically significant in developing the securitization in Malaysia. It can be concluded that transaction size significantly contribute to the determinant primary market spread in internal regression model while liquidity, transaction size and crisis is significant in both regression model. From five hypotheses, three hypotheses support that the determinants have a relationship with primary market spread.
Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

PubMed

Yelland, Lisa N; Salter, Amy B; Ryan, Philip

2011-10-15

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
[A study on city motor vehicle emission factors by tunnel test].

PubMed

Wang, B; Zhang, Y; Zhu, C; Yu, K; Chan, L; Chan, Z

2001-03-01

Applying the principle of tunnel test to run a typical across-river tunnel test in Guangzhou city, 48 h-online-monitor data include pollutant concentration, traffic activity and meteorological data were gained. The average motor vehicle emission factors of NOx, CO, SO2, PM10 and HC were calculated using mass balance which are 1.379, 15.404, 0.142, 0.637, 1.857 g/km. vehicle respectively. Based on that, combined emission factors of 8 types of city vehicles were calculated using linear regression. The result basically showed the character and level of motor vehicle emission in Chinese city.
Practical application of cure mixture model for long-term censored survivor data from a withdrawal clinical trial of patients with major depressive disorder.

PubMed

Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko

2010-04-23

Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.
Penalized nonparametric scalar-on-function regression via principal coordinates

PubMed Central

Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

2016-01-01

A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963
Geodesic least squares regression for scaling studies in magnetic confinement fusion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verdoolaege, Geert

In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less
Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation.

PubMed

Kaneko, Hiromasa; Funatsu, Kimito

2013-09-23

We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Analysis of multi-layered films. [determining dye densities by applying a regression analysis to the spectral response of the composite transparency

NASA Technical Reports Server (NTRS)

Scarpace, F. L.; Voss, A. W.

1973-01-01

Dye densities of multi-layered films are determined by applying a regression analysis to the spectral response of the composite transparency. The amount of dye in each layer is determined by fitting the sum of the individual dye layer densities to the measured dye densities. From this, dye content constants are calculated. Methods of calculating equivalent exposures are discussed. Equivalent exposures are a constant amount of energy over a limited band-width that will give the same dye content constants as the real incident energy. Methods of using these equivalent exposures for analysis of photographic data are presented.
Lifespan development of pro- and anti-saccades: multiple regression models for point estimates.

PubMed

Klein, Christoph; Foerster, Friedrich; Hartnegg, Klaus; Fischer, Burkhart

2005-12-07

The comparative study of anti- and pro-saccade task performance contributes to our functional understanding of the frontal lobes, their alterations in psychiatric or neurological populations, and their changes during the life span. In the present study, we apply regression analysis to model life span developmental effects on various pro- and anti-saccade task parameters, using data of a non-representative sample of 327 participants aged 9 to 88 years. Development up to the age of about 27 years was dominated by curvilinear rather than linear effects of age. Furthermore, the largest developmental differences were found for intra-subject variability measures and the anti-saccade task parameters. Ageing, by contrast, had the shape of a global linear decline of the investigated saccade functions, lacking the differential effects of age observed during development. While these results do support the assumption that frontal lobe functions can be distinguished from other functions by their strong and protracted development, they do not confirm the assumption of disproportionate deterioration of frontal lobe functions with ageing. We finally show that the regression models applied here to quantify life span developmental effects can also be used for individual predictions in applied research contexts or clinical practice.
Estimating the Extreme Behaviors of Students Performance Using Quantile Regression--Evidences from Taiwan

ERIC Educational Resources Information Center

Chen, Sheng-Tung; Kuo, Hsiao-I.; Chen, Chi-Chung

2012-01-01

The two-stage least squares approach together with quantile regression analysis is adopted here to estimate the educational production function. Such a methodology is able to capture the extreme behaviors of the two tails of students' performance and the estimation outcomes have important policy implications. Our empirical study is applied to the…
An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

ERIC Educational Resources Information Center

Strobl, Carolin; Malley, James; Tutz, Gerhard

2009-01-01

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Using Recursive Regression to Explore Nonlinear Relationships and Interactions: A Tutorial Applied to a Multicultural Education Study

ERIC Educational Resources Information Center

Strang, Kenneth David

2009-01-01

This paper discusses how a seldom-used statistical procedure, recursive regression (RR), can numerically and graphically illustrate data-driven nonlinear relationships and interaction of variables. This routine falls into the family of exploratory techniques, yet a few interesting features make it a valuable compliment to factor analysis and…
Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

ERIC Educational Resources Information Center

Rudner, Lawrence

2016-01-01

In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Subpixel urban land cover estimation: comparing cubist, random forests, and support vector regression

Treesearch

Jeffrey T. Walton

2008-01-01

Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
Emotional Issues and Peer Relations in Gifted Elementary Students: Regression Analysis of National Data

ERIC Educational Resources Information Center

Wiley, Kristofor R.

2013-01-01

Many of the social and emotional needs that have historically been associated with gifted students have been questioned on the basis of recent empirical evidence. Research on the topic, however, is often limited by sample size, selection bias, or definition. This study addressed these limitations by applying linear regression methodology to data…

Using partial least squares regression as a predictive tool in describing equine third metacarpal bone shape.

PubMed

Liley, Helen; Zhang, Ju; Firth, Elwyn; Fernandez, Justin; Besier, Thor

2017-11-01

Population variance in bone shape is an important consideration when applying the results of subject-specific computational models to a population. In this letter, we demonstrate the ability of partial least squares regression to provide an improved shape prediction of the equine third metacarpal epiphysis, using two easily obtained measurements.
New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

NASA Astrophysics Data System (ADS)

de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

2018-04-01

A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
Local regression type methods applied to the study of geophysics and high frequency financial data

NASA Astrophysics Data System (ADS)

Mariani, M. C.; Basu, K.

2014-09-01

In this work we applied locally weighted scatterplot smoothing techniques (Lowess/Loess) to Geophysical and high frequency financial data. We first analyze and apply this technique to the California earthquake geological data. A spatial analysis was performed to show that the estimation of the earthquake magnitude at a fixed location is very accurate up to the relative error of 0.01%. We also applied the same method to a high frequency data set arising in the financial sector and obtained similar satisfactory results. The application of this approach to the two different data sets demonstrates that the overall method is accurate and efficient, and the Lowess approach is much more desirable than the Loess method. The previous works studied the time series analysis; in this paper our local regression models perform a spatial analysis for the geophysics data providing different information. For the high frequency data, our models estimate the curve of best fit where data are dependent on time.
A Novel Degradation Identification Method for Wind Turbine Pitch System

NASA Astrophysics Data System (ADS)

Guo, Hui-Dong

2018-04-01

It’s difficult for traditional threshold value method to identify degradation of operating equipment accurately. An novel degradation evaluation method suitable for wind turbine condition maintenance strategy implementation was proposed in this paper. Based on the analysis of typical variable-speed pitch-to-feather control principle and monitoring parameters for pitch system, a multi input multi output (MIMO) regression model was applied to pitch system, where wind speed, power generation regarding as input parameters, wheel rotation speed, pitch angle and motor driving currency for three blades as output parameters. Then, the difference between the on-line measurement and the calculated value from the MIMO regression model applying least square support vector machines (LSSVM) method was defined as the Observed Vector of the system. The Gaussian mixture model (GMM) was applied to fitting the distribution of the multi dimension Observed Vectors. Applying the model established, the Degradation Index was calculated using the SCADA data of a wind turbine damaged its pitch bearing retainer and rolling body, which illustrated the feasibility of the provided method.
Automatic red eye correction and its quality metric

NASA Astrophysics Data System (ADS)

Safonov, Ilia V.; Rychagov, Michael N.; Kang, KiMin; Kim, Sang Ho

2008-01-01

The red eye artifacts are troublesome defect of amateur photos. Correction of red eyes during printing without user intervention and making photos more pleasant for an observer are important tasks. The novel efficient technique of automatic correction of red eyes aimed for photo printers is proposed. This algorithm is independent from face orientation and capable to detect paired red eyes as well as single red eyes. The approach is based on application of 3D tables with typicalness levels for red eyes and human skin tones and directional edge detection filters for processing of redness image. Machine learning is applied for feature selection. For classification of red eye regions a cascade of classifiers including Gentle AdaBoost committee from Classification and Regression Trees (CART) is applied. Retouching stage includes desaturation, darkening and blending with initial image. Several versions of approach implementation using trade-off between detection and correction quality, processing time, memory volume are possible. The numeric quality criterion of automatic red eye correction is proposed. This quality metric is constructed by applying Analytic Hierarchy Process (AHP) for consumer opinions about correction outcomes. Proposed numeric metric helped to choose algorithm parameters via optimization procedure. Experimental results demonstrate high accuracy and efficiency of the proposed algorithm in comparison with existing solutions.
Constrained Response Surface Optimisation and Taguchi Methods for Precisely Atomising Spraying Process

NASA Astrophysics Data System (ADS)

Luangpaiboon, P.; Suwankham, Y.; Homrossukon, S.

2010-10-01

This research presents a development of a design of experiment technique for quality improvement in automotive manufacturing industrial. The quality of interest is the colour shade, one of the key feature and exterior appearance for the vehicles. With low percentage of first time quality, the manufacturer has spent a lot of cost for repaired works as well as the longer production time. To permanently dissolve such problem, the precisely spraying condition should be optimized. Therefore, this work will apply the full factorial design, the multiple regression, the constrained response surface optimization methods or CRSOM, and Taguchi's method to investigate the significant factors and to determine the optimum factor level in order to improve the quality of paint shop. Firstly, 2κ full factorial was employed to study the effect of five factors including the paint flow rate at robot setting, the paint levelling agent, the paint pigment, the additive slow solvent, and non volatile solid at spraying of atomizing spraying machine. The response values of colour shade at 15 and 45 degrees were measured using spectrophotometer. Then the regression models of colour shade at both degrees were developed from the significant factors affecting each response. Consequently, both regression models were placed into the form of linear programming to maximize the colour shade subjected to 3 main factors including the pigment, the additive solvent and the flow rate. Finally, Taguchi's method was applied to determine the proper level of key variable factors to achieve the mean value target of colour shade. The factor of non volatile solid was found to be one more additional factor at this stage. Consequently, the proper level of all factors from both experiment design methods were used to set a confirmation experiment. It was found that the colour shades, both visual at 15 and 45 angel of measurement degrees of spectrophotometer, were nearly closed to the target and the defective at quality gate was also reduced from 0.35 WDPV to 0.10 WDPV. This reveals that the objective of this research is met and this procedure can be used as quality improvement guidance for paint shop of automotive vehicle.
Resting-state functional magnetic resonance imaging: the impact of regression analysis.

PubMed

Yeh, Chia-Jung; Tseng, Yu-Sheng; Lin, Yi-Ru; Tsai, Shang-Yueh; Huang, Teng-Yi

2015-01-01

To investigate the impact of regression methods on resting-state functional magnetic resonance imaging (rsfMRI). During rsfMRI preprocessing, regression analysis is considered effective for reducing the interference of physiological noise on the signal time course. However, it is unclear whether the regression method benefits rsfMRI analysis. Twenty volunteers (10 men and 10 women; aged 23.4 ± 1.5 years) participated in the experiments. We used node analysis and functional connectivity mapping to assess the brain default mode network by using five combinations of regression methods. The results show that regressing the global mean plays a major role in the preprocessing steps. When a global regression method is applied, the values of functional connectivity are significantly lower (P ≤ .01) than those calculated without a global regression. This step increases inter-subject variation and produces anticorrelated brain areas. rsfMRI data processed using regression should be interpreted carefully. The significance of the anticorrelated brain areas produced by global signal removal is unclear. Copyright © 2014 by the American Society of Neuroimaging.
The Global Signal in fMRI: Nuisance or Information?

PubMed Central

Nalci, Alican; Falahpour, Maryam

2017-01-01

The global signal is widely used as a regressor or normalization factor for removing the effects of global variations in the analysis of functional magnetic resonance imaging (fMRI) studies. However, there is considerable controversy over its use because of the potential bias that can be introduced when it is applied to the analysis of both task-related and resting-state fMRI studies. In this paper we take a closer look at the global signal, examining in detail the various sources that can contribute to the signal. For the most part, the global signal has been treated as a nuisance term, but there is growing evidence that it may also contain valuable information. We also examine the various ways that the global signal has been used in the analysis of fMRI data, including global signal regression, global signal subtraction, and global signal normalization. Furthermore, we describe new ways for understanding the effects of global signal regression and its relation to the other approaches. PMID:28213118
The incidence of health financing in South Africa: findings from a recent data set.

PubMed

Ataguba, John E; McIntyre, Di

2018-01-01

There is an international call for countries to ensure universal health coverage. This call has been embraced in South Africa (SA) in the form of a National Health Insurance (NHI). This is expected to be financed through general tax revenue with the possibility of additional earmarked taxes including a surcharge on personal income and/or a payroll tax for employers. Currently, health services are financed in SA through allocations from general tax revenue, direct out-of-pocket payments, and contributions to medical scheme. This paper uses the most recent data set to assess the progressivity of each health financing mechanism and overall financing system in SA. Applying standard and innovative methodologies for assessing progressivity, the study finds that general taxes and medical scheme contributions remain progressive, and direct out-of-pocket payments and indirect taxes are regressive. However, private health insurance contributions, across only the insured, are regressive. The policy implications of these findings are discussed in the context of the NHI.
Analysis of an experiment aimed at improving the reliability of transmission centre shafts.

PubMed

Davis, T P

1995-01-01

Smith (1991) presents a paper proposing the use of Weibull regression models to establish dependence of failure data (usually times) on covariates related to the design of the test specimens and test procedures. In his article Smith made the point that good experimental design was as important in reliability applications as elsewhere, and in view of the current interest in design inspired by Taguchi and others, we pay some attention in this article to that topic. A real case study from the Ford Motor Company is presented. Our main approach is to utilize suggestions in the literature for applying standard least squares techniques of experimental analysis even when there is likely to be nonnormal error, and censoring. This approach lacks theoretical justification, but its appeal is its simplicity and flexibility. For completeness we also include some analysis based on the proportional hazards model, and in an attempt to link back to Smith (1991), look at a Weibull regression model.
Regression models to estimate real-time concentrations of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-07

USGS Publications Warehouse

Oden, Timothy D.; Asquith, William H.; Milburn, Matthew S.

2009-01-01

In December 2005, the U.S. Geological Survey in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (total coliform and Escherichia coli), atrazine, and suspended sediment at two U.S. Geological Survey streamflow-gaging stations upstream from Lake Houston near Houston (08068500 Spring Creek near Spring, Texas, and 08070200 East Fork San Jacinto River near New Caney, Texas). The data from the discrete water-quality samples collected during 2005-07, in conjunction with monitored real-time data already being collected - physical properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), streamflow, and rainfall - were used to develop regression models for predicting water-quality constituent concentrations for inflows to Lake Houston. Rainfall data were obtained from a rain gage monitored by Harris County Homeland Security and Emergency Management and colocated with the Spring Creek station. The leaps and bounds algorithm was used to find the best subsets of possible regression models (minimum residual sum of squares for a given number of variables). The potential explanatory or predictive variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, rainfall, and time (to account for seasonal variations inherent in some water-quality data). The response variables at each site were nitrite plus nitrate nitrogen, total phosphorus, organic carbon, Escherichia coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities as a means to estimate concentrations of the various constituents under investigation, with accompanying estimates of measurement uncertainty. Each regression equation can be used to estimate concentrations of a given constituent in real time. In conjunction with estimated concentrations, constituent loads were estimated by multiplying the estimated concentration by the corresponding streamflow and applying the appropriate conversion factor. By computing loads from estimated constituent concentrations, a continuous record of estimated loads can be available for comparison to total maximum daily loads. The regression equations presented in this report are site specific to the Spring Creek and East Fork San Jacinto River streamflow-gaging stations; however, the methods that were developed and documented could be applied to other tributaries to Lake Houston for estimating real-time water-quality data for streams entering Lake Houston.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression

PubMed Central

Weiss, Brandi A.; Dardick, William

2015-01-01

This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
Logistic regression applied to natural hazards: rare event logistic regression with replications

NASA Astrophysics Data System (ADS)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.

PubMed

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.

PubMed

Weiss, Brandi A; Dardick, William

2016-12-01

This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among U.S. children.

PubMed

Stingone, Jeanette A; Pandey, Om P; Claudio, Luz; Pandey, Gaurav

2017-11-01

Data-driven machine learning methods present an opportunity to simultaneously assess the impact of multiple air pollutants on health outcomes. The goal of this study was to apply a two-stage, data-driven approach to identify associations between air pollutant exposure profiles and children's cognitive skills. Data from 6900 children enrolled in the Early Childhood Longitudinal Study, Birth Cohort, a national study of children born in 2001 and followed through kindergarten, were linked to estimated concentrations of 104 ambient air toxics in the 2002 National Air Toxics Assessment using ZIP code of residence at age 9 months. In the first-stage, 100 regression trees were learned to identify ambient air pollutant exposure profiles most closely associated with scores on a standardized mathematics test administered to children in kindergarten. In the second-stage, the exposure profiles frequently predicting lower math scores were included within linear regression models and adjusted for confounders in order to estimate the magnitude of their effect on math scores. This approach was applied to the full population, and then to the populations living in urban and highly-populated urban areas. Our first-stage results in the full population suggested children with low trichloroethylene exposure had significantly lower math scores. This association was not observed for children living in urban communities, suggesting that confounding related to urbanicity needs to be considered within the first-stage. When restricting our analysis to populations living in urban and highly-populated urban areas, high isophorone levels were found to predict lower math scores. Within adjusted regression models of children in highly-populated urban areas, the estimated effect of higher isophorone exposure on math scores was -1.19 points (95% CI -1.94, -0.44). Similar results were observed for the overall population of urban children. This data-driven, two-stage approach can be applied to other populations, exposures and outcomes to generate hypotheses within high-dimensional exposure data. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
An automated ranking platform for machine learning regression models for meat spoilage prediction using multi-spectral imaging and metabolic profiling.

PubMed

Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady

2017-09-01

Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Comparative study of some robust statistical methods: weighted, parametric, and nonparametric linear regression of HPLC convoluted peak responses using internal standard method in drug bioavailability studies.

PubMed

Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A

2013-05-01

This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Enhanced fertility prediction of cryopreserved boar spermatozoa using novel sperm function assessment.

PubMed

Daigneault, B W; McNamara, K A; Purdy, P H; Krisher, R L; Knox, R V; Rodriguez-Zas, S L; Miller, D J

2015-05-01

Due to reduced fertility, cryopreserved semen is seldom used for commercial porcine artificial insemination (AI). Predicting the fertility of individual frozen ejaculates for selection of higher quality semen prior to AI would increase overall success. Our objective was to test novel and traditional laboratory analyses to identify characteristics of cryopreserved spermatozoa that are related to boar fertility. Traditional post-thaw analyses of motility, viability, and acrosome integrity were performed on each ejaculate. In vitro fertilization, cleavage, and blastocyst development were also determined. Finally, spermatozoa-oviduct binding and competitive zona-binding assays were applied to assess sperm adhesion to these two matrices. Fertility of the same ejaculates subjected to laboratory assays was determined for each boar by multi-sire AI and defined as (i) the mean percentage of the litter sired and (ii) the mean number of piglets sired in each litter. Means of each laboratory evaluation were calculated for each boar and those values were applied to multiple linear regression analyses to determine which sperm traits could collectively estimate fertility in the simplest model. The regression model to predict the percent of litter sired by each boar was highly effective (p < 0.001, r(2) = 0.87) and included five traits; acrosome-compromised spermatozoa, percent live spermatozoa (0 and 60 min post-thaw), percent total motility, and the number of zona-bound spermatozoa. A second model to predict the number of piglets sired by boar was also effective (p < 0.05, r(2) = 0.57). These models indicate that the fertility of cryopreserved boar spermatozoa can be predicted effectively by including traditional and novel laboratory assays that consider functions of spermatozoa. © 2015 American Society of Andrology and European Academy of Andrology.
Prevalence of vitamin D deficiency and associated factors in women and newborns in the immediate postpartum period

PubMed Central

do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro

2015-01-01

Abstract Objective: To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. Methods: This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95%, was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α <5%. Results: From 226 women included, 200 (88.5%) were 20-44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. Conclusions: This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. PMID:26100593

A novel model incorporating two variability sources for describing motor evoked potentials

PubMed Central

Goetz, Stefan M.; Luber, Bruce; Lisanby, Sarah H.; Peterchev, Angel V.

2014-01-01

Objective Motor evoked potentials (MEPs) play a pivotal role in transcranial magnetic stimulation (TMS), e.g., for determining the motor threshold and probing cortical excitability. Sampled across the range of stimulation strengths, MEPs outline an input–output (IO) curve, which is often used to characterize the corticospinal tract. More detailed understanding of the signal generation and variability of MEPs would provide insight into the underlying physiology and aid correct statistical treatment of MEP data. Methods A novel regression model is tested using measured IO data of twelve subjects. The model splits MEP variability into two independent contributions, acting on both sides of a strong sigmoidal nonlinearity that represents neural recruitment. Traditional sigmoidal regression with a single variability source after the nonlinearity is used for comparison. Results The distribution of MEP amplitudes varied across different stimulation strengths, violating statistical assumptions in traditional regression models. In contrast to the conventional regression model, the dual variability source model better described the IO characteristics including phenomena such as changing distribution spread and skewness along the IO curve. Conclusions MEP variability is best described by two sources that most likely separate variability in the initial excitation process from effects occurring later on. The new model enables more accurate and sensitive estimation of the IO curve characteristics, enhancing its power as a detection tool, and may apply to other brain stimulation modalities. Furthermore, it extracts new information from the IO data concerning the neural variability—information that has previously been treated as noise. PMID:24794287
Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

PubMed

Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

2007-10-01

A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Characterizing nonconstant instrumental variance in emerging miniaturized analytical techniques.

PubMed

Noblitt, Scott D; Berg, Kathleen E; Cate, David M; Henry, Charles S

2016-04-07

Measurement variance is a crucial aspect of quantitative chemical analysis. Variance directly affects important analytical figures of merit, including detection limit, quantitation limit, and confidence intervals. Most reported analyses for emerging analytical techniques implicitly assume constant variance (homoskedasticity) by using unweighted regression calibrations. Despite the assumption of constant variance, it is known that most instruments exhibit heteroskedasticity, where variance changes with signal intensity. Ignoring nonconstant variance results in suboptimal calibrations, invalid uncertainty estimates, and incorrect detection limits. Three techniques where homoskedasticity is often assumed were covered in this work to evaluate if heteroskedasticity had a significant quantitative impact-naked-eye, distance-based detection using paper-based analytical devices (PADs), cathodic stripping voltammetry (CSV) with disposable carbon-ink electrode devices, and microchip electrophoresis (MCE) with conductivity detection. Despite these techniques representing a wide range of chemistries and precision, heteroskedastic behavior was confirmed for each. The general variance forms were analyzed, and recommendations for accounting for nonconstant variance discussed. Monte Carlo simulations of instrument responses were performed to quantify the benefits of weighted regression, and the sensitivity to uncertainty in the variance function was tested. Results show that heteroskedasticity should be considered during development of new techniques; even moderate uncertainty (30%) in the variance function still results in weighted regression outperforming unweighted regressions. We recommend utilizing the power model of variance because it is easy to apply, requires little additional experimentation, and produces higher-precision results and more reliable uncertainty estimates than assuming homoskedasticity. Copyright © 2016 Elsevier B.V. All rights reserved.
The 11-year solar cycle in current reanalyses: a (non)linear attribution study of the middle atmosphere

NASA Astrophysics Data System (ADS)

Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.

2015-06-01

This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11-year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (support vector regression, neural networks) besides the multiple linear regression approach. The analysis was applied to several current reanalysis data sets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how these types of data resolve especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the tropical stratosphere were found to be in qualitative agreement with previous attribution studies, although the agreement with observational results was incomplete, especially for JRA-55. The analysis also pointed to the solar signal in the ozone data sets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. The results obtained by linear regression were confirmed by the nonlinear approach through all data sets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. The seasonal evolution of the solar response was also discussed in terms of dynamical causalities in the winter hemispheres. The hypothetical mechanism of a weaker Brewer-Dobson circulation at solar maxima was reviewed together with a discussion of polar vortex behaviour.
Not Quite Normal: Consequences of Violating the Assumption of Normality in Regression Mixture Models

ERIC Educational Resources Information Center

Van Horn, M. Lee; Smith, Jessalyn; Fagan, Abigail A.; Jaki, Thomas; Feaster, Daniel J.; Masyn, Katherine; Hawkins, J. David; Howe, George

2012-01-01

Regression mixture models, which have only recently begun to be used in applied research, are a new approach for finding differential effects. This approach comes at the cost of the assumption that error terms are normally distributed within classes. This study uses Monte Carlo simulations to explore the effects of relatively minor violations of…
Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression

ERIC Educational Resources Information Center

Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.

2007-01-01

Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
Regressive Imagery in Creative Problem-Solving: Comparing Verbal Protocols of Expert and Novice Visual Artists and Computer Programmers

ERIC Educational Resources Information Center

Kozbelt, Aaron; Dexter, Scott; Dolese, Melissa; Meredith, Daniel; Ostrofsky, Justin

2015-01-01

We applied computer-based text analyses of regressive imagery to verbal protocols of individuals engaged in creative problem-solving in two domains: visual art (23 experts, 23 novices) and computer programming (14 experts, 14 novices). Percentages of words involving primary process and secondary process thought, plus emotion-related words, were…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

ERIC Educational Resources Information Center

Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

2012-01-01

Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
The Impact of Letter Grades on Student Effort, Course Selection, and Major Choice: A Regression-Discontinuity Analysis

ERIC Educational Resources Information Center

Main, Joyce B.; Ost, Ben

2014-01-01

The authors apply a regression-discontinuity design to identify the causal impact of letter grades on student effort within a course, subsequent credit hours taken, and the probability of majoring in economics. Their methodology addresses key issues in identifying the causal impact of letter grades: correlation with unobservable factors, such as…
Comparing The Effectiveness of a90/95 Calculations (Preprint)

DTIC Science & Technology

2006-09-01

Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Design and Optimization of a Chemometric-Assisted Spectrophotometric Determination of Telmisartan and Hydrochlorothiazide in Pharmaceutical Dosage Form

PubMed Central

Lakshmi, KS; Lakshmi, S

2010-01-01

Two chemometric methods were developed for the simultaneous determination of telmisartan and hydrochlorothiazide. The chemometric methods applied were principal component regression (PCR) and partial least square (PLS-1). These approaches were successfully applied to quantify the two drugs in the mixture using the information included in the UV absorption spectra of appropriate solutions in the range of 200-350 nm with the intervals Δλ = 1 nm. The calibration of PCR and PLS-1 models was evaluated by internal validation (prediction of compounds in its own designed training set of calibration) and by external validation over laboratory prepared mixtures and pharmaceutical preparations. The PCR and PLS-1 methods require neither any separation step, nor any prior graphical treatment of the overlapping spectra of the two drugs in a mixture. The results of PCR and PLS-1 methods were compared with each other and a good agreement was found. PMID:21331198
Design and optimization of a chemometric-assisted spectrophotometric determination of telmisartan and hydrochlorothiazide in pharmaceutical dosage form.

PubMed

Lakshmi, Ks; Lakshmi, S

2010-01-01

Two chemometric methods were developed for the simultaneous determination of telmisartan and hydrochlorothiazide. The chemometric methods applied were principal component regression (PCR) and partial least square (PLS-1). These approaches were successfully applied to quantify the two drugs in the mixture using the information included in the UV absorption spectra of appropriate solutions in the range of 200-350 nm with the intervals Δλ = 1 nm. The calibration of PCR and PLS-1 models was evaluated by internal validation (prediction of compounds in its own designed training set of calibration) and by external validation over laboratory prepared mixtures and pharmaceutical preparations. The PCR and PLS-1 methods require neither any separation step, nor any prior graphical treatment of the overlapping spectra of the two drugs in a mixture. The results of PCR and PLS-1 methods were compared with each other and a good agreement was found.
System Identification Applied to Dynamic CFD Simulation and Wind Tunnel Data

NASA Technical Reports Server (NTRS)

Murphy, Patrick C.; Klein, Vladislav; Frink, Neal T.; Vicroy, Dan D.

2011-01-01

Demanding aerodynamic modeling requirements for military and civilian aircraft have provided impetus for researchers to improve computational and experimental techniques. Model validation is a key component for these research endeavors so this study is an initial effort to extend conventional time history comparisons by comparing model parameter estimates and their standard errors using system identification methods. An aerodynamic model of an aircraft performing one-degree-of-freedom roll oscillatory motion about its body axes is developed. The model includes linear aerodynamics and deficiency function parameters characterizing an unsteady effect. For estimation of unknown parameters two techniques, harmonic analysis and two-step linear regression, were applied to roll-oscillatory wind tunnel data and to computational fluid dynamics (CFD) simulated data. The model used for this study is a highly swept wing unmanned aerial combat vehicle. Differences in response prediction, parameters estimates, and standard errors are compared and discussed
Early literacy and early numeracy: the value of including early literacy skills in the prediction of numeracy development.

PubMed

Purpura, David J; Hume, Laura E; Sims, Darcey M; Lonigan, Christopher J

2011-12-01

The purpose of this study was to examine whether early literacy skills uniquely predict early numeracy skills development. During the first year of the study, 69 3- to 5-year-old preschoolers were assessed on the Preschool Early Numeracy Skills (PENS) test and the Test of Preschool Early Literacy Skills (TOPEL). Participants were assessed again a year later on the PENS test and on the Applied Problems and Calculation subtests of the Woodcock-Johnson III Tests of Achievement. Three mixed effect regressions were conducted using Time 2 PENS, Applied Problems, and Calculation as the dependent variables. Print Knowledge and Vocabulary accounted for unique variance in the prediction of Time 2 numeracy scores. Phonological Awareness did not uniquely predict any of the mathematics domains. The findings of this study identify an important link between early literacy and early numeracy development. Copyright © 2011 Elsevier Inc. All rights reserved.
Evaluation of the psychometric properties of the main meal quality index when applied in the UK population.

PubMed

Gorgulho, B M; Pot, G K; Marchioni, D M

2017-05-01

The aim of this study was to evaluate the validity and reliability of the Main Meal Quality Index when applied on the UK population. The indicator was developed to assess meal quality in different populations, and is composed of 10 components: fruit, vegetables (excluding potatoes), ratio of animal protein to total protein, fiber, carbohydrate, total fat, saturated fat, processed meat, sugary beverages and desserts, and energy density, resulting in a score range of 0-100 points. The performance of the indicator was measured using strategies for assessing content validity, construct validity, discriminant validity and reliability, including principal component analysis, linear regression models and Cronbach's alpha. The indicator presented good reliability. The Main Meal Quality Index has been shown to be valid for use as an instrument to evaluate, monitor and compare the quality of meals consumed by adults in the United Kingdom.
Comparative study between derivative spectrophotometry and multivariate calibration as analytical tools applied for the simultaneous quantitation of Amlodipine, Valsartan and Hydrochlorothiazide.

PubMed

Darwish, Hany W; Hassan, Said A; Salem, Maissa Y; El-Zeany, Badr A

2013-09-01

Four simple, accurate and specific methods were developed and validated for the simultaneous estimation of Amlodipine (AML), Valsartan (VAL) and Hydrochlorothiazide (HCT) in commercial tablets. The derivative spectrophotometric methods include Derivative Ratio Zero Crossing (DRZC) and Double Divisor Ratio Spectra-Derivative Spectrophotometry (DDRS-DS) methods, while the multivariate calibrations used are Principal Component Regression (PCR) and Partial Least Squares (PLSs). The proposed methods were applied successfully in the determination of the drugs in laboratory-prepared mixtures and in commercial pharmaceutical preparations. The validity of the proposed methods was assessed using the standard addition technique. The linearity of the proposed methods is investigated in the range of 2-32, 4-44 and 2-20 μg/mL for AML, VAL and HCT, respectively. Copyright © 2013 Elsevier B.V. All rights reserved.
Simultaneous determination of three herbicides by differential pulse voltammetry and chemometrics.

PubMed

Ni, Yongnian; Wang, Lin; Kokot, Serge

2011-01-01

A novel differential pulse voltammetry method (DPV) was researched and developed for the simultaneous determination of Pendimethalin, Dinoseb and sodium 5-nitroguaiacolate (5NG) with the aid of chemometrics. The voltammograms of these three compounds overlapped significantly, and to facilitate the simultaneous determination of the three analytes, chemometrics methods were applied. These included classical least squares (CLS), principal component regression (PCR), partial least squares (PLS) and radial basis function-artificial neural networks (RBF-ANN). A separately prepared verification data set was used to confirm the calibrations, which were built from the original and first derivative data matrices of the voltammograms. On the basis relative prediction errors and recoveries of the analytes, the RBF-ANN and the DPLS (D - first derivative spectra) models performed best and are particularly recommended for application. The DPLS calibration model was applied satisfactorily for the prediction of the three analytes from market vegetables and lake water samples.
Assessment of higher order structure comparability in therapeutic proteins using nuclear magnetic resonance spectroscopy.

PubMed

Amezcua, Carlos A; Szabo, Christina M

2013-06-01

In this work, we applied nuclear magnetic resonance (NMR) spectroscopy to rapidly assess higher order structure (HOS) comparability in protein samples. Using a variation of the NMR fingerprinting approach described by Panjwani et al. [2010. J Pharm Sci 99(8):3334-3342], three nonglycosylated proteins spanning a molecular weight range of 6.5-67 kDa were analyzed. A simple statistical method termed easy comparability of HOS by NMR (ECHOS-NMR) was developed. In this method, HOS similarity between two samples is measured via the correlation coefficient derived from linear regression analysis of binned NMR spectra. Applications of this method include HOS comparability assessment during new product development, manufacturing process changes, supplier changes, next-generation products, and the development of biosimilars to name just a few. We foresee ECHOS-NMR becoming a routine technique applied to comparability exercises used to complement data from other analytical techniques. Copyright © 2013 Wiley Periodicals, Inc.
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio

USGS Publications Warehouse

Koltun, G.F.; Roberts, J.W.

1990-01-01

Multiple-regression equations are presented for estimating flood-peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at ungaged sites on rural, unregulated streams in Ohio. The average standard errors of prediction for the equations range from 33.4% to 41.4%. Peak discharge estimates determined by log-Pearson Type III analysis using data collected through the 1987 water year are reported for 275 streamflow-gaging stations. Ordinary least-squares multiple-regression techniques were used to divide the State into three regions and to identify a set of basin characteristics that help explain station-to- station variation in the log-Pearson estimates. Contributing drainage area, main-channel slope, and storage area were identified as suitable explanatory variables. Generalized least-square procedures, which include historical flow data and account for differences in the variance of flows at different gaging stations, spatial correlation among gaging station records, and variable lengths of station record were used to estimate the regression parameters. Weighted peak-discharge estimates computed as a function of the log-Pearson Type III and regression estimates are reported for each station. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site located on the same stream. Limitations and shortcomings cited in an earlier report on the magnitude and frequency of floods in Ohio are addressed in this study. Geographic bias is no longer evident for the Maumee River basin of northwestern Ohio. No bias is found to be associated with the forested-area characteristic for the range used in the regression analysis (0.0 to 99.0%), nor is this characteristic significant in explaining peak discharges. Surface-mined area likewise is not significant in explaining peak discharges, and the regression equations are not biased when applied to basins having approximately 30% or less surface-mined area. Analyses of residuals indicate that the equations tend to overestimate flood-peak discharges for basins having approximately 30% or more surface-mined area. (USGS)
Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting

NASA Technical Reports Server (NTRS)

Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)

2002-01-01

We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.

Statistical methods for astronomical data with upper limits. II - Correlation and regression

NASA Technical Reports Server (NTRS)

Isobe, T.; Feigelson, E. D.; Nelson, P. I.

1986-01-01

Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Estimating design-flood discharges for streams in Iowa using drainage-basin and channel-geometry characteristics

USGS Publications Warehouse

Eash, D.A.

1993-01-01

Procedures provided for applying the drainage-basin and channel-geometry regression equations depend on whether the design-flood discharge estimate is for a site on an ungaged stream, an ungaged site on a gaged stream, or a gaged site. When both a drainage-basin and a channel-geometry regression-equation estimate are available for a stream site, a procedure is presented for determining a weighted average of the two flood estimates. The drainage-basin regression equations are applicable to unregulated rural drainage areas less than 1,060 square miles, and the channel-geometry regression equations are applicable to unregulated rural streams in Iowa with stabilized channels.
A Comparison of Various MRA Methods Applied to Longitudinal Evaluation Studies in Vocational Education.

ERIC Educational Resources Information Center

Kapes, Jerome T.; And Others

Three models of multiple regression analysis (MRA): single equation, commonality analysis, and path analysis, were applied to longitudinal data from the Pennsylvania Vocational Development Study. Variables influencing weekly income of vocational education students one year after high school graduation were examined: grade point averages (grades…
Artificial Neural Networks: A New Approach to Predicting Application Behavior.

ERIC Educational Resources Information Center

Gonzalez, Julie M. Byers; DesJardins, Stephen L.

2002-01-01

Applied the technique of artificial neural networks to predict which students were likely to apply to one research university. Compared the results to the traditional analysis tool, logistic regression modeling. Found that the addition of artificial intelligence models was a useful new tool for predicting student application behavior. (EV)
A refined method for multivariate meta-analysis and meta-regression.

PubMed

Jackson, Daniel; Riley, Richard D

2014-02-20

Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model

PubMed Central

2013-01-01

Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?

PubMed

Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M

2018-05-01

Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Marginal regression models for clustered count data based on zero-inflated Conway-Maxwell-Poisson distribution with applications.

PubMed

Choo-Wosoba, Hyoyoung; Levy, Steven M; Datta, Somnath

2016-06-01

Community water fluoridation is an important public health measure to prevent dental caries, but it continues to be somewhat controversial. The Iowa Fluoride Study (IFS) is a longitudinal study on a cohort of Iowa children that began in 1991. The main purposes of this study (http://www.dentistry.uiowa.edu/preventive-fluoride-study) were to quantify fluoride exposures from both dietary and nondietary sources and to associate longitudinal fluoride exposures with dental fluorosis (spots on teeth) and dental caries (cavities). We analyze a subset of the IFS data by a marginal regression model with a zero-inflated version of the Conway-Maxwell-Poisson distribution for count data exhibiting excessive zeros and a wide range of dispersion patterns. In general, we introduce two estimation methods for fitting a ZICMP marginal regression model. Finite sample behaviors of the estimators and the resulting confidence intervals are studied using extensive simulation studies. We apply our methodologies to the dental caries data. Our novel modeling incorporating zero inflation, clustering, and overdispersion sheds some new light on the effect of community water fluoridation and other factors. We also include a second application of our methodology to a genomic (next-generation sequencing) dataset that exhibits underdispersion. © 2015, The International Biometric Society.
Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

PubMed Central

Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

2017-01-01

Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868
Providing the Fire Risk Map in Forest Area Using a Geographically Weighted Regression Model with Gaussin Kernel and Modis Images, a Case Study: Golestan Province

NASA Astrophysics Data System (ADS)

Shah-Heydari pour, A.; Pahlavani, P.; Bigdeli, B.

2017-09-01

According to the industrialization of cities and the apparent increase in pollutants and greenhouse gases, the importance of forests as the natural lungs of the earth is felt more than ever to clean these pollutants. Annually, a large part of the forests is destroyed due to the lack of timely action during the fire. Knowledge about areas with a high-risk of fire and equipping these areas by constructing access routes and allocating the fire-fighting equipment can help to eliminate the destruction of the forest. In this research, the fire risk of region was forecasted and the risk map of that was provided using MODIS images by applying geographically weighted regression model with Gaussian kernel and ordinary least squares over the effective parameters in forest fire including distance from residential areas, distance from the river, distance from the road, height, slope, aspect, soil type, land use, average temperature, wind speed, and rainfall. After the evaluation, it was found that the geographically weighted regression model with Gaussian kernel forecasted 93.4% of the all fire points properly, however the ordinary least squares method could forecast properly only 66% of the fire points.
A crash-prediction model for multilane roads.

PubMed

Caliendo, Ciro; Guida, Maurizio; Parisi, Alessandra

2007-07-01

Considerable research has been carried out in recent years to establish relationships between crashes and traffic flow, geometric infrastructure characteristics and environmental factors for two-lane rural roads. Crash-prediction models focused on multilane rural roads, however, have rarely been investigated. In addition, most research has paid but little attention to the safety effects of variables such as stopping sight distance and pavement surface characteristics. Moreover, the statistical approaches have generally included Poisson and Negative Binomial regression models, whilst Negative Multinomial regression model has been used to a lesser extent. Finally, as far as the authors are aware, prediction models involving all the above-mentioned factors have still not been developed in Italy for multilane roads, such as motorways. Thus, in this paper crash-prediction models for a four-lane median-divided Italian motorway were set up on the basis of accident data observed during a 5-year monitoring period extending between 1999 and 2003. The Poisson, Negative Binomial and Negative Multinomial regression models, applied separately to tangents and curves, were used to model the frequency of accident occurrence. Model parameters were estimated by the Maximum Likelihood Method, and the Generalized Likelihood Ratio Test was applied to detect the significant variables to be included in the model equation. Goodness-of-fit was measured by means of both the explained fraction of total variation and the explained fraction of systematic variation. The Cumulative Residuals Method was also used to test the adequacy of a regression model throughout the range of each variable. The candidate set of explanatory variables was: length (L), curvature (1/R), annual average daily traffic (AADT), sight distance (SD), side friction coefficient (SFC), longitudinal slope (LS) and the presence of a junction (J). Separate prediction models for total crashes and for fatal and injury crashes only were considered. For curves it is shown that significant variables are L, 1/R and AADT, whereas for tangents they are L, AADT and junctions. The effect of rain precipitation was analysed on the basis of hourly rainfall data and assumptions about drying time. It is shown that a wet pavement significantly increases the number of crashes. The models developed in this paper for Italian motorways appear to be useful for many applications such as the detection of critical factors, the estimation of accident reduction due to infrastructure and pavement improvement, and the predictions of accidents counts when comparing different design options. Thus this research may represent a point of reference for engineers in adjusting or designing multilane roads.
Landscape controls on total and methyl Hg in the Upper Hudson River basin, New York, USA

USGS Publications Warehouse

Burns, Douglas A.; Riva-Murray, K.; Bradley, P.M.; Aiken, G.R.; Brigham, M.E.

2012-01-01

Approaches are needed to better predict spatial variation in riverine Hg concentrations across heterogeneous landscapes that include mountains, wetlands, and open waters. We applied multivariate linear regression to determine the landscape factors and chemical variables that best account for the spatial variation of total Hg (THg) and methyl Hg (MeHg) concentrations in 27 sub-basins across the 493 km2 upper Hudson River basin in the Adirondack Mountains of New York. THg concentrations varied by sixfold, and those of MeHg by 40-fold in synoptic samples collected at low-to-moderate flow, during spring and summer of 2006 and 2008. Bivariate linear regression relations of THg and MeHg concentrations with either percent wetland area or DOC concentrations were significant but could account for only about 1/3 of the variation in these Hg forms in summer. In contrast, multivariate linear regression relations that included metrics of (1) hydrogeomorphology, (2) riparian/wetland area, and (3) open water, explained about 66% to >90% of spatial variation in each Hg form in spring and summer samples. These metrics reflect the influence of basin morphometry and riparian soils on Hg source and transport, and the role of open water as a Hg sink. Multivariate models based solely on these landscape metrics generally accounted for as much or more of the variation in Hg concentrations than models based on chemical and physical metrics, and show great promise for identifying waters with expected high Hg concentrations in the Adirondack region and similar glaciated riverine ecosystems.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

PubMed

Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

2011-09-01

Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework

PubMed Central

Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

2012-01-01

Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
Using automated texture features to determine the probability for masking of a tumor on mammography, but not ultrasound.

PubMed

Häberle, Lothar; Hack, Carolin C; Heusinger, Katharina; Wagner, Florian; Jud, Sebastian M; Uder, Michael; Beckmann, Matthias W; Schulz-Wendtland, Rüdiger; Wittenberg, Thomas; Fasching, Peter A

2017-08-30

Tumors in radiologically dense breast were overlooked on mammograms more often than tumors in low-density breasts. A fast reproducible and automated method of assessing percentage mammographic density (PMD) would be desirable to support decisions whether ultrasonography should be provided for women in addition to mammography in diagnostic mammography units. PMD assessment has still not been included in clinical routine work, as there are issues of interobserver variability and the procedure is quite time consuming. This study investigated whether fully automatically generated texture features of mammograms can replace time-consuming semi-automatic PMD assessment to predict a patient's risk of having an invasive breast tumor that is visible on ultrasound but masked on mammography (mammography failure). This observational study included 1334 women with invasive breast cancer treated at a hospital-based diagnostic mammography unit. Ultrasound was available for the entire cohort as part of routine diagnosis. Computer-based threshold PMD assessments ("observed PMD") were carried out and 363 texture features were obtained from each mammogram. Several variable selection and regression techniques (univariate selection, lasso, boosting, random forest) were applied to predict PMD from the texture features. The predicted PMD values were each used as new predictor for masking in logistic regression models together with clinical predictors. These four logistic regression models with predicted PMD were compared among themselves and with a logistic regression model with observed PMD. The most accurate masking prediction was determined by cross-validation. About 120 of the 363 texture features were selected for predicting PMD. Density predictions with boosting were the best substitute for observed PMD to predict masking. Overall, the corresponding logistic regression model performed better (cross-validated AUC, 0.747) than one without mammographic density (0.734), but less well than the one with the observed PMD (0.753). However, in patients with an assigned mammography failure risk >10%, covering about half of all masked tumors, the boosting-based model performed at least as accurately as the original PMD model. Automatically generated texture features can replace semi-automatically determined PMD in a prediction model for mammography failure, such that more than 50% of masked tumors could be discovered.
Element enrichment factor calculation using grain-size distribution and functional data regression.

PubMed

Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R

2015-01-01

In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
High-throughput screening and stability optimization of anti-streptavidin IgG1 and IgG2 formulations.

PubMed

Alekseychyk, Larysa; Su, Cheng; Becker, Gerald W; Treuheit, Michael J; Razinkov, Vladimir I

2014-10-01

Selection of a suitable formulation that provides adequate product stability is an important aspect of the development of biopharmaceutical products. Stability of proteins includes not only resistance to chemical modifications but also conformational and colloidal stabilities. While chemical degradation of antibodies is relatively easy to detect and control, propensity for conformational changes and/or aggregation during manufacturing or long-term storage is difficult to predict. In many cases, the formulation factors that increase one type of stability may significantly decrease another type under the same or different conditions. Often compromise is necessary to minimize the adverse effects of an antibody formulation by careful optimization of multiple factors responsible for overall stability. In this study, high-throughput stress and characterization techniques were applied to 96 formulations of anti-streptavidin antibodies (an IgG1 and an IgG2) to choose optimal formulations. Stress and analytical methods applied in this study were 96-well plate based using an automated liquid handling system to prepare the different formulations and sample plates. Aggregation and clipping propensity were evaluated by temperature and mechanical stresses. Multivariate regression analysis of high-throughput data was performed to find statistically significant formulation factors that alter measured parameters such as monomer percentage or unfolding temperature. The results of the regression models were used to maximize the stabilities of antibodies under different formulations and to find the optimal formulation space for each molecule. Comparison of the IgG1 and IgG2 data indicated an overall greater stability of the IgG1 molecule under the conditions studied. The described method can easily be applied to both initial preformulation screening and late-stage formulation development of biopharmaceutical products. © 2014 Society for Laboratory Automation and Screening.
Non-invasive glucose monitoring in patients with Type 1 diabetes: a Multisensor system combining sensors for dielectric and optical characterisation of skin.

PubMed

Caduff, Andreas; Talary, Mark S; Mueller, Martin; Dewarrat, Francois; Klisic, Jelena; Donath, Marc; Heinemann, Lutz; Stahel, Werner A

2009-05-15

In vivo variations of blood glucose (BG) are affecting the biophysical characteristics (e.g. dielectric and optical) of skin and underlying tissue (SAUT) at various frequencies. However, the skin impedance spectra for instance can also be affected by other factors, perturbing the glucose related information, factors such as temperature, skin moisture and sweat, blood perfusion as well as body movements affecting the sensor-skin contact. In order to be able to correct for such perturbing factors, a Multisensor system was developed including sensors to measure the identified factors. To evaluate the quality of glucose monitoring, the Multisensor was applied in 10 patients with Type 1 diabetes. Glucose was administered orally to induce hyperglycaemic excursions at two different study visits. For analysis of the sensor signals, a global multiple linear regression model was derived. The respective coefficients of the variables were determined from the sensor signals of this first study visit (R(2)=0.74, MARD=18.0%--mean absolute relative difference). The identical set of modelling coefficients of the first study visit was re-applied to the test data of the second study visit to evaluate the predictive power of the model (R(2)=0.68, MARD=27.3%). It appears as if the Multisensor together with the global linear regression model applied, allows for tracking glucose changes non-invasively in patients with diabetes without requiring new model coefficients for each visit. Confirmation of these findings in a larger study group and under less experimentally controlled conditions is required for understanding whether a global parameterisation routine is feasible.
Security of statistical data bases: invasion of privacy through attribute correlational modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Palley, M.A.

This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
Regularization Paths for Conditional Logistic Regression: The clogitL1 Package.

PubMed

Reid, Stephen; Tibshirani, Rob

2014-07-01

We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso [Formula: see text] and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by.

Regularization Paths for Conditional Logistic Regression: The clogitL1 Package

PubMed Central

Reid, Stephen; Tibshirani, Rob

2014-01-01

We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso (ℓ1) and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by. PMID:26257587
Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

NASA Astrophysics Data System (ADS)

Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

2018-01-01

In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

PubMed Central

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

PubMed Central

Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

2013-01-01

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Evaluation of energy consumption during aerobic sewage sludge treatment in dairy wastewater treatment plant.

PubMed

Dąbrowski, Wojciech; Żyłka, Radosław; Malinowski, Paweł

2017-02-01

The subject of the research conducted in an operating dairy wastewater treatment plant (WWTP) was to examine electric energy consumption during sewage sludge treatment. The excess sewage sludge was aerobically stabilized and dewatered with a screw press. Organic matter varied from 48% to 56% in sludge after stabilization and dewatering. It proves that sludge was properly stabilized and it was possible to apply it as a fertilizer. Measurement factors for electric energy consumption for mechanically dewatered sewage sludge were determined, which ranged between 0.94 and 1.5 kWhm -3 with the average value at 1.17 kWhm -3 . The shares of devices used for sludge dewatering and aerobic stabilization in the total energy consumption of the plant were also established, which were 3% and 25% respectively. A model of energy consumption during sewage sludge treatment was estimated according to experimental data. Two models were applied: linear regression for dewatering process and segmented linear regression for aerobic stabilization. The segmented linear regression model was also applied to total energy consumption during sewage sludge treatment in the examined dairy WWTP. The research constitutes an introduction for further studies on defining a mathematical model used to optimize electric energy consumption by dairy WWTPs. Copyright © 2016 Elsevier Inc. All rights reserved.
Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008-2012.

PubMed

Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T

2017-07-01

Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

PubMed

Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

2013-01-01

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Application of nonlinear-regression methods to a ground-water flow model of the Albuquerque Basin, New Mexico

USGS Publications Warehouse

Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.

1998-01-01

This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface flow from adjacent regions; irrigation and septic field seepage; and leakage through the Rio Grande, canal, and Cochiti Reservoir beds. Ground water is discharged from the basin by withdrawal; evapotranspiration; subsurface flow; and flow to the Rio Grande, canals, and drains. The transient, three-dimensional numerical model of ground-water flow to which nonlinear-regression methods were applied simulates flow in the Albuquerque Basin from 1900 to March 1995. Six different basin subsurface configurations are considered in the model. These configurations are designed to test the effects of (1) varying the simulated basin thickness, (2) including a hypothesized hydrogeologic unit with large hydraulic conductivity in the western part of the basin (the west basin high-K zone), and (3) substantially lowering the simulated hydraulic conductivity of a fault in the western part of the basin (the low-K fault zone). The model with each of the subsurface configurations was calibrated using a nonlinear least- squares regression technique. The calibration data set includes 802 hydraulic-head measurements that provide broad spatial and temporal coverage of basin conditions, and one measurement of net flow from the Rio Grande and drains to the ground-water system in the Albuquerque area. Data are weighted on the basis of estimates of the standard deviations of measurement errors. The 10 to 12 parameters to which the calibration data as a whole are generally most sensitive were estimated by nonlinear regression, whereas the remaining model parameter values were specified. Results of model calibration indicate that the optimal parameter estimates as a whole are most reasonable in calibrations of the model with with configurations 3 (which contains 1,600-ft-thick basin deposits and the west basin high-K zone), 4 (which contains 5,000-ft-thick basin de
Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral data

Treesearch

Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan

2006-01-01

We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Plan View Pattern Control for Steel Plates through Constrained Locally Weighted Regression

NASA Astrophysics Data System (ADS)

Shigemori, Hiroyasu; Nambu, Koji; Nagao, Ryo; Araki, Tadashi; Mizushima, Narihito; Kano, Manabu; Hasebe, Shinji

A technique for performing parameter identification in a locally weighted regression model using foresight information on the physical properties of the object of interest as constraints was proposed. This method was applied to plan view pattern control of steel plates, and a reduction of shape nonconformity (crop) at the plate head end was confirmed by computer simulation based on real operation data.
Challenges Associated with Estimating Utility in Wet Age-Related Macular Degeneration: A Novel Regression Analysis to Capture the Bilateral Nature of the Disease.

PubMed

Hodgson, Robert; Reason, Timothy; Trueman, David; Wickstead, Rose; Kusel, Jeanette; Jasilek, Adam; Claxton, Lindsay; Taylor, Matthew; Pulikottil-Jacob, Ruth

2017-10-01

The estimation of utility values for the economic evaluation of therapies for wet age-related macular degeneration (AMD) is a particular challenge. Previous economic models in wet AMD have been criticized for failing to capture the bilateral nature of wet AMD by modelling visual acuity (VA) and utility values associated with the better-seeing eye only. Here we present a de novo regression analysis using generalized estimating equations (GEE) applied to a previous dataset of time trade-off (TTO)-derived utility values from a sample of the UK population that wore contact lenses to simulate visual deterioration in wet AMD. This analysis allows utility values to be estimated as a function of VA in both the better-seeing eye (BSE) and worse-seeing eye (WSE). VAs in both the BSE and WSE were found to be statistically significant (p < 0.05) when regressed separately. When included without an interaction term, only the coefficient for VA in the BSE was significant (p = 0.04), but when an interaction term between VA in the BSE and WSE was included, only the constant term (mean TTO utility value) was significant, potentially a result of the collinearity between the VA of the two eyes. The lack of both formal model fit statistics from the GEE approach and theoretical knowledge to support the superiority of one model over another make it difficult to select the best model. Limitations of this analysis arise from the potential influence of collinearity between the VA of both eyes, and the use of contact lenses to reflect VA states to obtain the original dataset. Whilst further research is required to elicit more accurate utility values for wet AMD, this novel regression analysis provides a possible source of utility values to allow future economic models to capture the quality of life impact of changes in VA in both eyes. Novartis Pharmaceuticals UK Limited.
Spatially Explicit Estimates of Suspended Sediment and Bedload Transport Rates for Western Oregon and Northwestern California

NASA Astrophysics Data System (ADS)

O'Connor, J. E.; Wise, D. R.; Mangano, J.; Jones, K.

2015-12-01

Empirical analyses of suspended sediment and bedload transport gives estimates of sediment flux for western Oregon and northwestern California. The estimates of both bedload and suspended load are from regression models relating measured annual sediment yield to geologic, physiographic, and climatic properties of contributing basins. The best models include generalized geology and either slope or precipitation. The best-fit suspended-sediment model is based on basin geology, precipitation, and area of recent wildfire. It explains 65% of the variance for 68 suspended sediment measurement sites within the model area. Predicted suspended sediment yields range from no yield from the High Cascades geologic province to 200 tonnes/ km2-yr in the northern Oregon Coast Range and 1000 tonnes/km2-yr in recently burned areas of the northern Klamath terrain. Bed-material yield is similarly estimated from a regression model based on 22 sites of measured bed-material transport, mostly from reservoir accumulation analyses but also from several bedload measurement programs. The resulting best-fit regression is based on basin slope and the presence/absence of the Klamath geologic terrane. For the Klamath terrane, bed-material yield is twice that of the other geologic provinces. This model explains more than 80% of the variance of the better-quality measurements. Predicted bed-material yields range up to 350 tonnes/ km2-yr in steep areas of the Klamath terrane. Applying these regressions to small individual watersheds (mean size; 66 km2 for bed-material; 3 km2 for suspended sediment) and cumulating totals down the hydrologic network (but also decreasing the bed-material flux by experimentally determined attrition rates) gives spatially explicit estimates of both bed-material and suspended sediment flux. This enables assessment of several management issues, including the effects of dams on bedload transport, instream gravel mining, habitat formation processes, and water-quality. The combined fluxes can also be compared to long-term rock uplift and cosmogenically determined landscape erosion rates.
The extraction of simple relationships in growth factor-specific multiple-input and multiple-output systems in cell-fate decisions by backward elimination PLS regression.

PubMed

Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya

2013-01-01

Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert

2013-01-01

Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?

PubMed

Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina

2016-01-01

High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
A computational approach to compare regression modelling strategies in prediction research.

PubMed

Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

2016-08-25

It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Updated generalized biomass equations for North American tree species

Treesearch

David C. Chojnacky; Linda S. Heath; Jennifer C. Jenkins

2014-01-01

Historically, tree biomass at large scales has been estimated by applying dimensional analysis techniques and field measurements such as diameter at breast height (dbh) in allometric regression equations. Equations often have been developed using differing methods and applied only to certain species or isolated areas. We previously had compiled and combined (in meta-...
Regression techniques for oceanographic parameter retrieval using space-borne microwave radiometry

NASA Technical Reports Server (NTRS)

Hofer, R.; Njoku, E. G.

1981-01-01

Variations of conventional multiple regression techniques are applied to the problem of remote sensing of oceanographic parameters from space. The techniques are specifically adapted to the scanning multichannel microwave radiometer (SMRR) launched on the Seasat and Nimbus 7 satellites to determine ocean surface temperature, wind speed, and atmospheric water content. The retrievals are studied primarily from a theoretical viewpoint, to illustrate the retrieval error structure, the relative importances of different radiometer channels, and the tradeoffs between spatial resolution and retrieval accuracy. Comparisons between regressions using simulated and actual SMMR data are discussed; they show similar behavior.
Correlation and simple linear regression.

PubMed

Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

2003-06-01

In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Forecasting drug utilization and expenditure in a metropolitan health region.

PubMed

Wettermark, Björn; Persson, Marie E; Wilking, Nils; Kalin, Mats; Korkmaz, Seher; Hjemdahl, Paul; Godman, Brian; Petzold, Max; Gustafsson, Lars L

2010-05-17

New pharmacological therapies are challenging the healthcare systems, and there is an increasing need to assess their therapeutic value in relation to existing alternatives as well as their potential budget impact. Consequently, new models to introduce drugs in healthcare are urgently needed. In the metropolitan health region of Stockholm, Sweden, a model has been developed including early warning (horizon scanning), forecasting of drug utilization and expenditure, critical drug evaluation as well as structured programs for the introduction and follow-up of new drugs. The aim of this paper is to present the forecasting model and the predicted growth in all therapeutic areas in 2010 and 2011. Linear regression analysis was applied to aggregate sales data on hospital sales and dispensed drugs in ambulatory care, including both reimbursed expenditure and patient co-payment. The linear regression was applied on each pharmacological group based on four observations 2006-2009, and the crude predictions estimated for the coming two years 2010-2011. The crude predictions were then adjusted for factors likely to increase or decrease future utilization and expenditure, such as patent expiries, new drugs to be launched or new guidelines from national bodies or the regional Drug and Therapeutics Committee. The assessment included a close collaboration with clinical, clinical pharmacological and pharmaceutical experts from the regional Drug and Therapeutics Committee. The annual increase in total expenditure for prescription and hospital drugs was predicted to be 2.0% in 2010 and 4.0% in 2011. Expenditures will increase in most therapeutic areas, but most predominantly for antineoplastic and immune modulating agents as well as drugs for the nervous system, infectious diseases, and blood and blood-forming organs. The utilisation and expenditure of drugs is difficult to forecast due to uncertainties about the rate of adoption of new medicines and various ongoing healthcare reforms and activities to improve the quality and efficiency of prescribing. Nevertheless, we believe our model will be valuable as an early warning system to start developing guidance for new drugs including systems to monitor their effectiveness, safety and cost-effectiveness in clinical practice.

Forecasting drug utilization and expenditure in a metropolitan health region

PubMed Central

2010-01-01

Background New pharmacological therapies are challenging the healthcare systems, and there is an increasing need to assess their therapeutic value in relation to existing alternatives as well as their potential budget impact. Consequently, new models to introduce drugs in healthcare are urgently needed. In the metropolitan health region of Stockholm, Sweden, a model has been developed including early warning (horizon scanning), forecasting of drug utilization and expenditure, critical drug evaluation as well as structured programs for the introduction and follow-up of new drugs. The aim of this paper is to present the forecasting model and the predicted growth in all therapeutic areas in 2010 and 2011. Methods Linear regression analysis was applied to aggregate sales data on hospital sales and dispensed drugs in ambulatory care, including both reimbursed expenditure and patient co-payment. The linear regression was applied on each pharmacological group based on four observations 2006-2009, and the crude predictions estimated for the coming two years 2010-2011. The crude predictions were then adjusted for factors likely to increase or decrease future utilization and expenditure, such as patent expiries, new drugs to be launched or new guidelines from national bodies or the regional Drug and Therapeutics Committee. The assessment included a close collaboration with clinical, clinical pharmacological and pharmaceutical experts from the regional Drug and Therapeutics Committee. Results The annual increase in total expenditure for prescription and hospital drugs was predicted to be 2.0% in 2010 and 4.0% in 2011. Expenditures will increase in most therapeutic areas, but most predominantly for antineoplastic and immune modulating agents as well as drugs for the nervous system, infectious diseases, and blood and blood-forming organs. Conclusions The utilisation and expenditure of drugs is difficult to forecast due to uncertainties about the rate of adoption of new medicines and various ongoing healthcare reforms and activities to improve the quality and efficiency of prescribing. Nevertheless, we believe our model will be valuable as an early warning system to start developing guidance for new drugs including systems to monitor their effectiveness, safety and cost-effectiveness in clinical practice. PMID:20478043
Subject-specific body segment parameter estimation using 3D photogrammetry with multiple cameras

PubMed Central

Morris, Mark; Sellers, William I.

2015-01-01

Inertial properties of body segments, such as mass, centre of mass or moments of inertia, are important parameters when studying movements of the human body. However, these quantities are not directly measurable. Current approaches include using regression models which have limited accuracy: geometric models with lengthy measuring procedures or acquiring and post-processing MRI scans of participants. We propose a geometric methodology based on 3D photogrammetry using multiple cameras to provide subject-specific body segment parameters while minimizing the interaction time with the participants. A low-cost body scanner was built using multiple cameras and 3D point cloud data generated using structure from motion photogrammetric reconstruction algorithms. The point cloud was manually separated into body segments, and convex hulling applied to each segment to produce the required geometric outlines. The accuracy of the method can be adjusted by choosing the number of subdivisions of the body segments. The body segment parameters of six participants (four male and two female) are presented using the proposed method. The multi-camera photogrammetric approach is expected to be particularly suited for studies including populations for which regression models are not available in literature and where other geometric techniques or MRI scanning are not applicable due to time or ethical constraints. PMID:25780778
Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model.

PubMed

Michalaki, Paraskevi; Quddus, Mohammed A; Pitfield, David; Huetson, Andrew

2015-12-01

The severity of motorway accidents that occurred on the hard shoulder (HS) is higher than for the main carriageway (MC). This paper compares and contrasts the most important factors affecting the severity of HS and MC accidents on motorways in England. Using police reported accident data, the accidents that occurred on motorways in England are grouped into two categories (i.e., HS and MC) according to the location. A generalized ordered logistic regression model is then applied to identify the factors affecting the severity of HS and MC accidents on motorways. The factors examined include accident and vehicle characteristics, traffic and environment conditions, as well as other behavioral factors. Results suggest that the factors positively affecting the severity include: number of vehicles involved in the accident, peak-hour traffic time, and low visibility. Differences between HS and MC accidents are identified, with the most important being the involvement of heavy goods vehicles (HGVs) and driver fatigue, which are found to be more crucial in increasing the severity of HS accidents. Measures to increase awareness of HGV drivers regarding the risk of fatigue when driving on motorways, and especially the nearside lane, should be taken by the stakeholders. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Online Statistical Modeling (Regression Analysis) for Independent Responses

NASA Astrophysics Data System (ADS)

Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

2017-06-01

Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Subject-specific body segment parameter estimation using 3D photogrammetry with multiple cameras.

PubMed

Peyer, Kathrin E; Morris, Mark; Sellers, William I

2015-01-01

Inertial properties of body segments, such as mass, centre of mass or moments of inertia, are important parameters when studying movements of the human body. However, these quantities are not directly measurable. Current approaches include using regression models which have limited accuracy: geometric models with lengthy measuring procedures or acquiring and post-processing MRI scans of participants. We propose a geometric methodology based on 3D photogrammetry using multiple cameras to provide subject-specific body segment parameters while minimizing the interaction time with the participants. A low-cost body scanner was built using multiple cameras and 3D point cloud data generated using structure from motion photogrammetric reconstruction algorithms. The point cloud was manually separated into body segments, and convex hulling applied to each segment to produce the required geometric outlines. The accuracy of the method can be adjusted by choosing the number of subdivisions of the body segments. The body segment parameters of six participants (four male and two female) are presented using the proposed method. The multi-camera photogrammetric approach is expected to be particularly suited for studies including populations for which regression models are not available in literature and where other geometric techniques or MRI scanning are not applicable due to time or ethical constraints.
An Enhanced Engineering Perspective of Global Climate Systems and Statistical Formulation of Terrestrial CO2 Exchanges

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Yuanshun; Baek, Seung H.; Garcia-Diza, Alberto

2012-01-01

This paper designs a comprehensive approach based on the engineering machine/system concept, to model, analyze, and assess the level of CO2 exchange between the atmosphere and terrestrial ecosystems, which is an important factor in understanding changes in global climate. The focus of this article is on spatial patterns and on the correlation between levels of CO2 fluxes and a variety of influencing factors in eco-environments. The engineering/machine concept used is a system protocol that includes the sequential activities of design, test, observe, and model. This concept is applied to explicitly include various influencing factors and interactions associated with CO2 fluxes.more » To formulate effective models of a large and complex climate system, this article introduces a modeling technique that will be referred to as Stochastic Filtering Analysis of Variance (SFANOVA). The CO2 flux data observed from some sites of AmeriFlux are used to illustrate and validate the analysis, prediction and globalization capabilities of the proposed engineering approach and the SF-ANOVA technology. The SF-ANOVA modeling approach was compared to stepwise regression, ridge regression, and neural networks. The comparison indicated that the proposed approach is a valid and effective tool with similar accuracy and less complexity than the other procedures.« less
PACSIN2 polymorphism is associated with thiopurine-induced hematological toxicity in children with acute lymphoblastic leukaemia undergoing maintenance therapy.

PubMed

Smid, Alenka; Karas-Kuzelicki, Natasa; Jazbec, Janez; Mlinaric-Rascan, Irena

2016-07-25

Adequate maintenance therapy for childhood acute lymphoblastic leukemia (ALL), with 6-mercaptopurine as an essential component, is necessary for retaining durable remission. Interruptions or discontinuations of the therapy due to drug-related toxicities, which can be life threatening, may result in an increased risk of relapse. In this retrospective study including 305 paediatric ALL patients undergoing maintenance therapy, we systematically investigated the individual and combined effects of genetic variants of folate pathway enzymes, as well as of polymorphisms in PACSIN2 and ITPA, on drug-induced toxicities by applying a multi-analytical approach including logistic regression (LR), classification and regression tree (CART) and generalized multifactor dimensionality reduction (GMDR). In addition to the TPMT genotype, confirmed to be a major determinant of drug related toxicities, we identified the PACSIN2 rs2413739TT genotype as being a significant risk factor for 6-MP-induced toxicity in wild-type TPMT patients. A gene-gene interaction between MTRR (rs1801394) and MTHFR (rs1801133) was detected by GMDR and proved to have an independent effect on the risk of stomatitis, as shown by LR analysis. To our knowledge, this is the first study showing PACSIN2 genotype association with hematological toxicity in ALL patients undergoing maintenance therapy.
A bioavailable strontium isoscape for Western Europe: A machine learning approach

PubMed Central

von Holstein, Isabella C. C.; Laffoon, Jason E.; Willmes, Malte; Liu, Xiao-Ming; Davies, Gareth R.

2018-01-01

Strontium isotope ratios (87Sr/86Sr) are gaining considerable interest as a geolocation tool and are now widely applied in archaeology, ecology, and forensic research. However, their application for provenance requires the development of baseline models predicting surficial 87Sr/86Sr variations (“isoscapes”). A variety of empirically-based and process-based models have been proposed to build terrestrial 87Sr/86Sr isoscapes but, in their current forms, those models are not mature enough to be integrated with continuous-probability surface models used in geographic assignment. In this study, we aim to overcome those limitations and to predict 87Sr/86Sr variations across Western Europe by combining process-based models and a series of remote-sensing geospatial products into a regression framework. We find that random forest regression significantly outperforms other commonly used regression and interpolation methods, and efficiently predicts the multi-scale patterning of 87Sr/86Sr variations by accounting for geological, geomorphological and atmospheric controls. Random forest regression also provides an easily interpretable and flexible framework to integrate different types of environmental auxiliary variables required to model the multi-scale patterning of 87Sr/86Sr variability. The method is transferable to different scales and resolutions and can be applied to the large collection of geospatial data available at local and global levels. The isoscape generated in this study provides the most accurate 87Sr/86Sr predictions in bioavailable strontium for Western Europe (R2 = 0.58 and RMSE = 0.0023) to date, as well as a conservative estimate of spatial uncertainty by applying quantile regression forest. We anticipate that the method presented in this study combined with the growing numbers of bioavailable 87Sr/86Sr data and satellite geospatial products will extend the applicability of the 87Sr/86Sr geo-profiling tool in provenance applications. PMID:29847595
A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.

PubMed

Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M

2017-06-01

Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Indirect Comparisons: A Review of Reporting and Methodological Quality

PubMed Central

Donegan, Sarah; Williamson, Paula; Gamble, Carrol; Tudur-Smith, Catrin

2010-01-01

Background The indirect comparison of two interventions can be valuable in many situations. However, the quality of an indirect comparison will depend on several factors including the chosen methodology and validity of underlying assumptions. Published indirect comparisons are increasingly more common in the medical literature, but as yet, there are no published recommendations of how they should be reported. Our aim is to systematically review the quality of published indirect comparisons to add to existing empirical data suggesting that improvements can be made when reporting and applying indirect comparisons. Methodology/Findings Reviews applying statistical methods to indirectly compare the clinical effectiveness of two interventions using randomised controlled trials were eligible. We searched (1966–2008) Database of Abstracts and Reviews of Effects, The Cochrane library, and Medline. Full review publications were assessed for eligibility. Specific criteria to assess quality were developed and applied. Forty-three reviews were included. Adequate methodology was used to calculate the indirect comparison in 41 reviews. Nineteen reviews assessed the similarity assumption using sensitivity analysis, subgroup analysis, or meta-regression. Eleven reviews compared trial-level characteristics. Twenty-four reviews assessed statistical homogeneity. Twelve reviews investigated causes of heterogeneity. Seventeen reviews included direct and indirect evidence for the same comparison; six reviews assessed consistency. One review combined both evidence types. Twenty-five reviews urged caution in interpretation of results, and 24 reviews indicated when results were from indirect evidence by stating this term with the result. Conclusions This review shows that the underlying assumptions are not routinely explored or reported when undertaking indirect comparisons. We recommend, therefore, that the quality of indirect comparisons should be improved, in particular, by assessing assumptions and reporting the assessment methods applied. We propose that the quality criteria applied in this article may provide a basis to help review authors carry out indirect comparisons and to aid appropriate interpretation. PMID:21085712
Magnitude, frequency, and trends of floods at gaged and ungaged sites in Washington, based on data through water year 2014

USGS Publications Warehouse

Mastin, Mark C.; Konrad, Christopher P.; Veilleux, Andrea G.; Tecca, Alison E.

2016-09-20

An investigation into the magnitude and frequency of floods in Washington State computed the annual exceedance probability (AEP) statistics for 648 U.S. Geological Survey unregulated streamgages in and near the borders of Washington using the recorded annual peak flows through water year 2014. This is an updated report from a previous report published in 1998 that used annual peak flows through the water year 1996. New in this report, a regional skew coefficient was developed for the Pacific Northwest region that includes areas in Oregon, Washington, Idaho and western Montana within the Columbia River drainage basin south of the United States-Canada border, the coastal areas of Oregon and western Washington, and watersheds draining into Puget Sound, Washington. The skew coefficient is an important term in the Log Pearson Type III equation used to define the distribution of the log-transformed annual peaks. The Expected Moments Algorithm was used to fit historical and censored peak-flow data to the log Pearson Type III distribution. A Multiple Grubb-Beck test was employed to censor low outliers of annual peak flows to improve on the frequency distribution. This investigation also includes a section on observed trends in annual peak flows that showed significant trends (p-value < 0.05) in 21 of 83 long-term sites, but with small magnitude Kendall tau values suggesting a limited monotonic trend in the time series of annual peaks. Most of the sites with a significant trend in western Washington were positive and all the sites with significant trends (three sites) in eastern Washington were negative.Multivariate regression analysis with measured basin characteristics and the AEP statistics at long-term, unregulated, and un-urbanized (defined as drainage basins with less than 5 percent impervious land cover for this investigation) streamgages within Washington and some in Idaho and Oregon that are near the Washington border was used to develop equations to estimate AEP statistics at ungaged basins. Washington was divided into four regions to improve the accuracy of the regression equations; a set of equations for eight selected AEPs and for each region were constructed. Selected AEP statistics included the annual peak flows that equaled or exceeded 50, 20, 10, 4, 2, 1, 0.5 and 0.2 percent of the time equivalent to peak flows for peaks with a 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year recurrence intervals, respectively. Annual precipitation and drainage area were the significant basin characteristics in the regression equations for all four regression regions in Washington and forest cover was significant for the two regression regions in eastern Washington. Average standard error of prediction for the regional regression equations ranged from 70.19 to 125.72 percent for Regression Regions 1 and 2 on the eastern side of the Cascade Mountains and from 43.22 to 58.04 percent for Regression Regions 3 and 4 on the western side of the Cascade Mountains. The pseudo coefficient of determination (where a value of 100 signifies a perfect regression model) ranged from 68.39 to 90.68 for Regression Regions 1 and 2, and 92.35 to 95.44 for Regions 3 and 4.The calculated AEP statistics for the streamgages and the regional regression equations are expected to be incorporated into StreamStats after the publication of this report. StreamStats is the interactive Web-based map tool created by the U.S. Geological Survey to allow the user to choose a streamgage and obtain published statistics or choose ungaged locations where the program automatically applies the regional regression equations and computes the estimates of the AEP statistics.
Application and interpretation of functional data analysis techniques to differential scanning calorimetry data from lupus patients.

PubMed

Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N

2017-01-01

DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Lateral-Directional Parameter Estimation on the X-48B Aircraft Using an Abstracted, Multi-Objective Effector Model

NASA Technical Reports Server (NTRS)

Ratnayake, Nalin A.; Waggoner, Erin R.; Taylor, Brian R.

2011-01-01

The problem of parameter estimation on hybrid-wing-body aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aerodynamic control effectors that act in coplanar motion. This adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of flight and simulation data must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, time-decorrelation techniques are applied to a model structure selected through stepwise regression for simulated and flight-generated lateral-directional parameter estimation data. A virtual effector model that uses mathematical abstractions to describe the multi-axis effects of clamshell surfaces is developed and applied. Comparisons are made between time history reconstructions and observed data in order to assess the accuracy of the regression model. The Cram r-Rao lower bounds of the estimated parameters are used to assess the uncertainty of the regression model relative to alternative models. Stepwise regression was found to be a useful technique for lateral-directional model design for hybrid-wing-body aircraft, as suggested by available flight data. Based on the results of this study, linear regression parameter estimation methods using abstracted effectors are expected to perform well for hybrid-wing-body aircraft properly equipped for the task.
HIGHLIGHTING DIFFERENCES BETWEEN CONDITIONAL AND UNCONDITIONAL QUANTILE REGRESSION APPROACHES THROUGH AN APPLICATION TO ASSESS MEDICATION ADHERENCE

PubMed Central

BORAH, BIJAN J.; BASU, ANIRBAN

2014-01-01

The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer’s disease. PMID:23616446
Linear regression analysis: part 14 of a series on evaluation of scientific publications.

PubMed

Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

2010-11-01

Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Quantile regression for the statistical analysis of immunological data with many non-detects.

PubMed

Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

2012-07-07

Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Valuing avoided morbidity using meta-regression analysis: what can health status measures and QALYs tell us about WTP?

PubMed

Van Houtven, George; Powers, John; Jessup, Amber; Yang, Jui-Chen

2006-08-01

Many economists argue that willingness-to-pay (WTP) measures are most appropriate for assessing the welfare effects of health changes. Nevertheless, the health evaluation literature is still dominated by studies estimating nonmonetary health status measures (HSMs), which are often used to assess changes in quality-adjusted life years (QALYs). Using meta-regression analysis, this paper combines results from both WTP and HSM studies applied to acute morbidity, and it tests whether a systematic relationship exists between HSM and WTP estimates. We analyze over 230 WTP estimates from 17 different studies and find evidence that QALY-based estimates of illness severity--as measured by the Quality of Well-Being (QWB) Scale--are significant factors in explaining variation in WTP, as are changes in the duration of illness and the average income and age of the study populations. In addition, we test and reject the assumption of a constant WTP per QALY gain. We also demonstrate how the estimated meta-regression equations can serve as benefit transfer functions for policy analysis. By specifying the change in duration and severity of the acute illness and the characteristics of the affected population, we apply the regression functions to predict average WTP per case avoided. Copyright 2006 John Wiley & Sons, Ltd.
Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty.

PubMed

Svensson, Fredrik; Aniceto, Natalia; Norinder, Ulf; Cortes-Ciriano, Isidro; Spjuth, Ola; Carlsson, Lars; Bender, Andreas

2018-05-29

Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.
Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality

NASA Astrophysics Data System (ADS)

Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

2018-07-01

In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship.
Regression-Based Norms for a Bi-factor Model for Scoring the Brief Test of Adult Cognition by Telephone (BTACT).

PubMed

Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E

2015-05-01

The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

NASA Astrophysics Data System (ADS)

Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

2017-05-01

This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Unified Heat Kernel Regression for Diffusion, Kernel Smoothing and Wavelets on Manifolds and Its Application to Mandible Growth Modeling in CT Images

PubMed Central

Chung, Moo K.; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K.

2014-01-01

We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel regression is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. Unlike many previous partial differential equation based approaches involving diffusion, our approach represents the solution of diffusion analytically, reducing numerical inaccuracy and slow convergence. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, we have applied the method in characterizing the localized growth pattern of mandible surfaces obtained in CT images from subjects between ages 0 and 20 years by regressing the length of displacement vectors with respect to the template surface. PMID:25791435
Systolic time interval v heart rate regression equations using atropine: reproducibility studies.

PubMed Central

Kelman, A W; Sumner, D J; Whiting, B

1981-01-01

1. Systolic time intervals (STI) were recorded in six normal male subjects over a period of 3 weeks. On one day per week, each subject received incremental doses of atropine intravenously to increase heart rate, allowing the determination of individual STI v HR regression equations. On the other days STI were recorded with the subjects resting, in the supine position. 2. There were highly significant regression relationships between heart rate and both LVET and QS2, but not between heart rate and PEP. 3. The regression relationships showed little intra-subject variability, but a large degree of inter-subject variability: they proved adequate to correct the STI for the daily fluctuations in heart rate. 4. Administration of small doses of atropine intravenously provides a satisfactory and convenient method of deriving individual STI v HR regression equations which can be applied over a period of weeks. PMID:7248136
Systolic time interval v heart rate regression equations using atropine: reproducibility studies.

PubMed

Kelman, A W; Sumner, D J; Whiting, B

1981-07-01

1. Systolic time intervals (STI) were recorded in six normal male subjects over a period of 3 weeks. On one day per week, each subject received incremental doses of atropine intravenously to increase heart rate, allowing the determination of individual STI v HR regression equations. On the other days STI were recorded with the subjects resting, in the supine position. 2. There were highly significant regression relationships between heart rate and both LVET and QS2, but not between heart rate and PEP. 3. The regression relationships showed little intra-subject variability, but a large degree of inter-subject variability: they proved adequate to correct the STI for the daily fluctuations in heart rate. 4. Administration of small doses of atropine intravenously provides a satisfactory and convenient method of deriving individual STI v HR regression equations which can be applied over a period of weeks.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

USGS Publications Warehouse

Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

2016-01-01

Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

NASA Astrophysics Data System (ADS)

Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

2016-12-01

Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Comparison of a Full Food-Frequency Questionnaire with the Three-Day Unweighted Food Records in Young Polish Adult Women: Implications for Dietary Assessment

PubMed Central

Kowalkowska, Joanna; Slowinska, Malgorzata A.; Slowinski, Dariusz; Dlugosz, Anna; Niedzwiedzka, Ewa; Wadolowska, Lidia

2013-01-01

The food frequency questionnaire (FFQ) and the food record (FR) are among the most common methods used in dietary research. It is important to know that is it possible to use both methods simultaneously in dietary assessment and prepare a single, comprehensive interpretation. The aim of this study was to compare the energy and nutritional value of diets, determined by the FFQ and by the three-day food records of young women. The study involved 84 female students aged 21–26 years (mean of 22.2 ± 0.8 years). Completing the FFQ was preceded by obtaining unweighted food records covering three consecutive days. Energy and nutritional value of diets was assessed for both methods (FFQ-crude, FR-crude). Data obtained for FFQ-crude were adjusted with beta-coefficient equaling 0.5915 (FFQ-adjusted) and regression analysis (FFQ-regressive). The FFQ-adjusted was calculated as FR-crude/FFQ-crude ratio of mean daily energy intake. FFQ-regressive was calculated for energy and each nutrient separately using regression equation, including FFQ-crude and FR-crude as covariates. For FR-crude and FFQ-crude the energy value of diets was standardized to 2000 kcal (FR-standardized, FFQ-standardized). Methods of statistical comparison included a dependent samples t-test, a chi-square test, and the Bland-Altman method. The mean energy intake in FFQ-crude was significantly higher than FR-crude (2740.5 kcal vs. 1621.0 kcal, respectively). For FR-standardized and FFQ-standardized, significance differences were found in the mean intake of 18 out of 31 nutrients, for FR-crude and FFQ-adjusted in 13 out of 31 nutrients and FR-crude and FFQ-regressive in 11 out of 31 nutrients. The Bland-Altman method showed an overestimation of energy and nutrient intake by FFQ-crude in comparison to FR-crude, e.g., total protein was overestimated by 34.7 g/day (95% Confidence Interval, CI: −29.6, 99.0 g/day) and fat by 48.6 g/day (95% CI: −36.4, 133.6 g/day). After regressive transformation of FFQ, the absolute difference between FFQ-regressive and FR-crude equaled 0.0 g/day and 95% CI were much better (e.g., for total protein 95% CI: −32.7, 32.7 g/day, for fat 95% CI: −49.6, 49.6 g/day). In conclusion, differences in nutritional value of diets resulted from overestimating energy intake by the FFQ in comparison to the three-day unweighted food records. Adjustment of energy and nutrient intake applied for the FFQ using various methods, particularly regression equations, significantly improved the agreement between results obtained by both methods and dietary assessment. To obtain the most accurate results in future studies using this FFQ, energy and nutrient intake should be adjusted by the regression equations presented in this paper. PMID:23877089
Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

PubMed

Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

2017-11-15

A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
Regression estimators for generic health-related quality of life and quality-adjusted life years.

PubMed

Basu, Anirban; Manca, Andrea

2012-01-01

To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
Job stress models, depressive disorders and work performance of engineers in microelectronics industry.

PubMed

Chen, Sung-Wei; Wang, Po-Chuan; Hsin, Ping-Lung; Oates, Anthony; Sun, I-Wen; Liu, Shen-Ing

2011-01-01

Microelectronic engineers are considered valuable human capital contributing significantly toward economic development, but they may encounter stressful work conditions in the context of a globalized industry. The study aims at identifying risk factors of depressive disorders primarily based on job stress models, the Demand-Control-Support and Effort-Reward Imbalance models, and at evaluating whether depressive disorders impair work performance in microelectronics engineers in Taiwan. The case-control study was conducted among 678 microelectronics engineers, 452 controls and 226 cases with depressive disorders which were defined by a score 17 or more on the Beck Depression Inventory and a psychiatrist's diagnosis. The self-administered questionnaires included the Job Content Questionnaire, Effort-Reward Imbalance Questionnaire, demography, psychosocial factors, health behaviors and work performance. Hierarchical logistic regression was applied to identify risk factors of depressive disorders. Multivariate linear regressions were used to determine factors affecting work performance. By hierarchical logistic regression, risk factors of depressive disorders are high demands, low work social support, high effort/reward ratio and low frequency of physical exercise. Combining the two job stress models may have better predictive power for depressive disorders than adopting either model alone. Three multivariate linear regressions provide similar results indicating that depressive disorders are associated with impaired work performance in terms of absence, role limitation and social functioning limitation. The results may provide insight into the applicability of job stress models in a globalized high-tech industry considerably focused in non-Western countries, and the design of workplace preventive strategies for depressive disorders in Asian electronics engineering population.
Parametric correlation functions to model the structure of permanent environmental (co)variances in milk yield random regression models.

PubMed

Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G

2009-09-01

The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
The relationship between quality of work life and turnover intention of primary health care nurses in Saudi Arabia.

PubMed

Almalki, Mohammed J; FitzGerald, Gerry; Clark, Michele

2012-09-12

Quality of work life (QWL) has been found to influence the commitment of health professionals, including nurses. However, reliable information on QWL and turnover intention of primary health care (PHC) nurses is limited. The aim of this study was to examine the relationship between QWL and turnover intention of PHC nurses in Saudi Arabia. A cross-sectional survey was used in this study. Data were collected using Brooks' survey of Quality of Nursing Work Life, the Anticipated Turnover Scale and demographic data questions. A total of 508 PHC nurses in the Jazan Region, Saudi Arabia, completed the questionnaire (RR = 87%). Descriptive statistics, t-test, ANOVA, General Linear Model (GLM) univariate analysis, standard multiple regression, and hierarchical multiple regression were applied for analysis using SPSS v17 for Windows. Findings suggested that the respondents were dissatisfied with their work life, with almost 40% indicating a turnover intention from their current PHC centres. Turnover intention was significantly related to QWL. Using standard multiple regression, 26% of the variance in turnover intention was explained by QWL, p < 0.001, with R2 = .263. Further analysis using hierarchical multiple regression found that the total variance explained by the model as a whole (demographics and QWL) was 32.1%, p < 0.001. QWL explained an additional 19% of the variance in turnover intention, after controlling for demographic variables. Creating and maintaining a healthy work life for PHC nurses is very important to improve their work satisfaction, reduce turnover, enhance productivity and improve nursing care outcomes.
[Prevalence of vitamin D deficiency and associated factors in women and newborns in the immediate postpartum period].

PubMed

do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado Junior, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro

2015-01-01

To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95% was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α<5%. From 226 women included, 200 (88.5%) were 20 to 44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. Copyright © 2015 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
The relationship between quality of work life and turnover intention of primary health care nurses in Saudi Arabia

PubMed Central

2012-01-01

Background Quality of work life (QWL) has been found to influence the commitment of health professionals, including nurses. However, reliable information on QWL and turnover intention of primary health care (PHC) nurses is limited. The aim of this study was to examine the relationship between QWL and turnover intention of PHC nurses in Saudi Arabia. Methods A cross-sectional survey was used in this study. Data were collected using Brooks’ survey of Quality of Nursing Work Life, the Anticipated Turnover Scale and demographic data questions. A total of 508 PHC nurses in the Jazan Region, Saudi Arabia, completed the questionnaire (RR = 87%). Descriptive statistics, t-test, ANOVA, General Linear Model (GLM) univariate analysis, standard multiple regression, and hierarchical multiple regression were applied for analysis using SPSS v17 for Windows. Results Findings suggested that the respondents were dissatisfied with their work life, with almost 40% indicating a turnover intention from their current PHC centres. Turnover intention was significantly related to QWL. Using standard multiple regression, 26% of the variance in turnover intention was explained by QWL, p < 0.001, with R2 = .263. Further analysis using hierarchical multiple regression found that the total variance explained by the model as a whole (demographics and QWL) was 32.1%, p < 0.001. QWL explained an additional 19% of the variance in turnover intention, after controlling for demographic variables. Conclusions Creating and maintaining a healthy work life for PHC nurses is very important to improve their work satisfaction, reduce turnover, enhance productivity and improve nursing care outcomes. PMID:22970764
A regularization corrected score method for nonlinear regression models with covariate error.

PubMed

Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna

2013-03-01

Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

NASA Astrophysics Data System (ADS)

Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

2017-02-01

Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Cider fermentation process monitoring by Vis-NIR sensor system and chemometrics.

PubMed

Villar, Alberto; Vadillo, Julen; Santos, Jose I; Gorritxategi, Eneko; Mabe, Jon; Arnaiz, Aitor; Fernández, Luis A

2017-04-15

Optimization of a multivariate calibration process has been undertaken for a Visible-Near Infrared (400-1100nm) sensor system, applied in the monitoring of the fermentation process of the cider produced in the Basque Country (Spain). The main parameters that were monitored included alcoholic proof, l-lactic acid content, glucose+fructose and acetic acid content. The multivariate calibration was carried out using a combination of different variable selection techniques and the most suitable pre-processing strategies were selected based on the spectra characteristics obtained by the sensor system. The variable selection techniques studied in this work include Martens Uncertainty test, interval Partial Least Square Regression (iPLS) and Genetic Algorithm (GA). This procedure arises from the need to improve the calibration models prediction ability for cider monitoring. Copyright © 2016 Elsevier Ltd. All rights reserved.
Consistent Tolerance Bounds for Statistical Distributions

NASA Technical Reports Server (NTRS)

Mezzacappa, M. A.

1983-01-01

Assumption that sample comes from population with particular distribution is made with confidence C if data lie between certain bounds. These "confidence bounds" depend on C and assumption about distribution of sampling errors around regression line. Graphical test criteria using tolerance bounds are applied in industry where statistical analysis influences product development and use. Applied to evaluate equipment life.
Mining hidden data to predict patient prognosis: texture feature extraction and machine learning in mammography

NASA Astrophysics Data System (ADS)

Leighs, J. A.; Halling-Brown, M. D.; Patel, M. N.

2018-03-01

The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.
Impact of maternal education, employment and family size on nutritional status of children.

PubMed

Iftikhar, Aisha; Bari, Attia; Bano, Iqbal; Masood, Qaisar

2017-01-01

To determine the impact of maternal education, employment, and family size on nutritional status of children. It was case control study conducted at OPD of children Hospital Lahore, from September 2015 to April 2017. Total 340 children (170 cases and 170 controls) with age range of six months to five years along with their mothers were included. Anthropometric measurements were plotted against WHO growth Charts. 170 wasted (<-2 SD) were matched with 170 controls (≥ -2 SD). Maternal education, employment and family size were compared between the cases and control. Confounding variables noted and dichotomized. Univariate analysis was carried out for factors under consideration i.e.; Maternal Education, employment and family size to study the association of each factor. Logistic regression analysis was applied to study the independent association. Maternal education had significant association with growth parameters; OR of 1.32 with confidence interval of (CI= 1.1 to 1.623). Employment status of mothers had OR of 1.132 with insignificant confidence interval of (CI=0.725 to 1.768). Family size had OR of one with insignificant confidence interval (CI=0.8 -1.21). Association remained same after applying bivariate logistic regression analysis. Maternal education has definite and significant effect on nutritional status of children. This is the key factor to be addressed for prevention or improvement of childhood malnutrition. For this it is imperative to launch sustainable programs at national and regional level to uplift women educational status to combat this ever increasing burden of malnutrition.

Experimental Investigations of Non-Stationary Properties In Radiometer Receivers Using Measurements of Multiple Calibration References

NASA Technical Reports Server (NTRS)

Racette, Paul; Lang, Roger; Zhang, Zhao-Nan; Zacharias, David; Krebs, Carolyn A. (Technical Monitor)

2002-01-01

Radiometers must be periodically calibrated because the receiver response fluctuates. Many techniques exist to correct for the time varying response of a radiometer receiver. An analytical technique has been developed that uses generalized least squares regression (LSR) to predict the performance of a wide variety of calibration algorithms. The total measurement uncertainty including the uncertainty of the calibration can be computed using LSR. The uncertainties of the calibration samples used in the regression are based upon treating the receiver fluctuations as non-stationary processes. Signals originating from the different sources of emission are treated as simultaneously existing random processes. Thus, the radiometer output is a series of samples obtained from these random processes. The samples are treated as random variables but because the underlying processes are non-stationary the statistics of the samples are treated as non-stationary. The statistics of the calibration samples depend upon the time for which the samples are to be applied. The statistics of the random variables are equated to the mean statistics of the non-stationary processes over the interval defined by the time of calibration sample and when it is applied. This analysis opens the opportunity for experimental investigation into the underlying properties of receiver non stationarity through the use of multiple calibration references. In this presentation we will discuss the application of LSR to the analysis of various calibration algorithms, requirements for experimental verification of the theory, and preliminary results from analyzing experiment measurements.
Estimation of Additive, Dominance, and Imprinting Genetic Variance Using Genomic Data

PubMed Central

Lopes, Marcos S.; Bastiaansen, John W. M.; Janss, Luc; Knol, Egbert F.; Bovenhuis, Henk

2015-01-01

Traditionally, exploration of genetic variance in humans, plants, and livestock species has been limited mostly to the use of additive effects estimated using pedigree data. However, with the development of dense panels of single-nucleotide polymorphisms (SNPs), the exploration of genetic variation of complex traits is moving from quantifying the resemblance between family members to the dissection of genetic variation at individual loci. With SNPs, we were able to quantify the contribution of additive, dominance, and imprinting variance to the total genetic variance by using a SNP regression method. The method was validated in simulated data and applied to three traits (number of teats, backfat, and lifetime daily gain) in three purebred pig populations. In simulated data, the estimates of additive, dominance, and imprinting variance were very close to the simulated values. In real data, dominance effects account for a substantial proportion of the total genetic variance (up to 44%) for these traits in these populations. The contribution of imprinting to the total phenotypic variance of the evaluated traits was relatively small (1–3%). Our results indicate a strong relationship between additive variance explained per chromosome and chromosome length, which has been described previously for other traits in other species. We also show that a similar linear relationship exists for dominance and imprinting variance. These novel results improve our understanding of the genetic architecture of the evaluated traits and shows promise to apply the SNP regression method to other traits and species, including human diseases. PMID:26438289
On-line prediction of yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score using the MARC beef carcass image analysis system.

PubMed

Shackelford, S D; Wheeler, T L; Koohmaraie, M

2003-01-01

The present experiment was conducted to evaluate the ability of the U.S. Meat Animal Research Center's beef carcass image analysis system to predict calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score under commercial beef processing conditions. In two commercial beef-processing facilities, image analysis was conducted on 800 carcasses on the beef-grading chain immediately after the conventional USDA beef quality and yield grades were applied. Carcasses were blocked by plant and observed calculated yield grade. The carcasses were then separated, with 400 carcasses assigned to a calibration data set that was used to develop regression equations, and the remaining 400 carcasses assigned to a prediction data set used to validate the regression equations. Prediction equations, which included image analysis variables and hot carcass weight, accounted for 90, 88, 90, 88, and 76% of the variation in calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score, respectively, in the prediction data set. In comparison, the official USDA yield grade as applied by online graders accounted for 73% of the variation in calculated yield grade. The technology described herein could be used by the beef industry to more accurately determine beef yield grades; however, this system does not provide an accurate enough prediction of marbling score to be used without USDA grader interaction for USDA quality grading.
Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

NASA Astrophysics Data System (ADS)

Chiu, Tina

This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Analysis of the Magnitude and Frequency of Peak Discharges for the Navajo Nation in Arizona, Utah, Colorado, and New Mexico

USGS Publications Warehouse

Waltemeyer, Scott D.

2006-01-01

Estimates of the magnitude and frequency of peak discharges are necessary for the reliable flood-hazard mapping in the Navajo Nation in Arizona, Utah, Colorado, and New Mexico. The Bureau of Indian Affairs, U.S. Army Corps of Engineers, and Navajo Nation requested that the U.S. Geological Survey update estimates of peak discharge magnitude for gaging stations in the region and update regional equations for estimation of peak discharge and frequency at ungaged sites. Equations were developed for estimating the magnitude of peak discharges for recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years at ungaged sites using data collected through 1999 at 146 gaging stations, an additional 13 years of peak-discharge data since a 1997 investigation, which used gaging-station data through 1986. The equations for estimation of peak discharges at ungaged sites were developed for flood regions 8, 11, high elevation, and 6 and are delineated on the basis of the hydrologic codes from the 1997 investigation. Peak discharges for selected recurrence intervals were determined at gaging stations by fitting observed data to a log-Pearson Type III distribution with adjustments for a low-discharge threshold and a zero skew coefficient. A low-discharge threshold was applied to frequency analysis of 82 of the 146 gaging stations. This application provides an improved fit of the log-Pearson Type III frequency distribution. Use of the low-discharge threshold generally eliminated the peak discharge having a recurrence interval of less than 1.4 years in the probability-density function. Within each region, logarithms of the peak discharges for selected recurrence intervals were related to logarithms of basin and climatic characteristics using stepwise ordinary least-squares regression techniques for exploratory data analysis. Generalized least-squares regression techniques, an improved regression procedure that accounts for time and spatial sampling errors, then was applied to the same data used in the ordinary least-squares regression analyses. The average standard error of prediction for a peak discharge have a recurrence interval of 100-years for region 8 was 53 percent (average) for the 100-year flood. The average standard of prediction, which includes average sampling error and average standard error of regression, ranged from 45 to 83 percent for the 100-year flood. Estimated standard error of prediction for a hybrid method for region 11 was large in the 1997 investigation. No distinction of floods produced from a high-elevation region was presented in the 1997 investigation. Overall, the equations based on generalized least-squares regression techniques are considered to be more reliable than those in the 1997 report because of the increased length of record and improved GIS method. Techniques for transferring flood-frequency relations to ungaged sites on the same stream can be estimated at an ungaged site by a direct application of the regional regression equation or at an ungaged site on a stream that has a gaging station upstream or downstream by using the drainage-area ratio and the drainage-area exponent from the regional regression equation of the respective region.
Retrieval of total suspended matter concentrations from high resolution WorldView-2 imagery: a case study of inland rivers

NASA Astrophysics Data System (ADS)

Shi, Liangliang; Mao, Zhihua; Wang, Zheng

2018-02-01

Satellite imagery has played an important role in monitoring water quality of lakes or coastal waters presently, but scarcely been applied in inland rivers. This paper presents an attempt of feasibility to apply regression model to quantify and map the concentrations of total suspended matter (CTSM) in inland rivers which have a large scale of spatial and a high CTSM dynamic range by using high resolution satellite remote sensing data, WorldView-2. An empirical approach to quantify CTSM by integrated use of high resolution WorldView-2 multispectral data and 21 in situ CTSM measurements. Radiometric correction, geometric and atmospheric correction involved in image processing procedure is carried out for deriving the surface reflectance to correlate the CTSM and satellite data by using single-variable and multivariable regression technique. Results of regression model show that the single near-infrared (NIR) band 8 of WorldView-2 have a relative strong relationship (R2=0.93) with CTSM. Different prediction models were developed on various combinations of WorldView-2 bands, the Akaike Information Criteria approach was used to choose the best model. The model involving band 1, 3, 5, and 8 of WorldView-2 had a best performance, whose R2 reach to 0.92, with SEE of 53.30 g/m3. The spatial distribution maps were produced by using the best multiple regression model. The results of this paper indicated that it is feasible to apply the empirical model by using high resolution satellite imagery to retrieve CTSM of inland rivers in routine monitoring of water quality.
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Evaluation of methods for managing censored results when calculating the geometric mean.

PubMed

Mikkonen, Hannah G; Clarke, Bradley O; Dasika, Raghava; Wallis, Christian J; Reichman, Suzie M

2018-01-01

Currently, there are conflicting views on the best statistical methods for managing censored environmental data. The method commonly applied by environmental science researchers and professionals is to substitute half the limit of reporting for derivation of summary statistics. This approach has been criticised by some researchers, raising questions around the interpretation of historical scientific data. This study evaluated four complete soil datasets, at three levels of simulated censorship, to test the accuracy of a range of censored data management methods for calculation of the geometric mean. The methods assessed included removal of censored results, substitution of a fixed value (near zero, half the limit of reporting and the limit of reporting), substitution by nearest neighbour imputation, maximum likelihood estimation, regression on order substitution and Kaplan-Meier/survival analysis. This is the first time such a comprehensive range of censored data management methods have been applied to assess the accuracy of calculation of the geometric mean. The results of this study show that, for describing the geometric mean, the simple method of substitution of half the limit of reporting is comparable or more accurate than alternative censored data management methods, including nearest neighbour imputation methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Regression Models for Identifying Noise Sources in Magnetic Resonance Images

PubMed Central

Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

2009-01-01

Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
The Spatial Distribution of Hepatitis C Virus Infections and Associated Determinants--An Application of a Geographically Weighted Poisson Regression for Evidence-Based Screening Interventions in Hotspots.

PubMed

Kauhl, Boris; Heil, Jeanne; Hoebe, Christian J P A; Schweikart, Jürgen; Krafft, Thomas; Dukers-Muijrers, Nicole H T M

2015-01-01

Hepatitis C Virus (HCV) infections are a major cause for liver diseases. A large proportion of these infections remain hidden to care due to its mostly asymptomatic nature. Population-based screening and screening targeted on behavioural risk groups had not proven to be effective in revealing these hidden infections. Therefore, more practically applicable approaches to target screenings are necessary. Geographic Information Systems (GIS) and spatial epidemiological methods may provide a more feasible basis for screening interventions through the identification of hotspots as well as demographic and socio-economic determinants. Analysed data included all HCV tests (n = 23,800) performed in the southern area of the Netherlands between 2002-2008. HCV positivity was defined as a positive immunoblot or polymerase chain reaction test. Population data were matched to the geocoded HCV test data. The spatial scan statistic was applied to detect areas with elevated HCV risk. We applied global regression models to determine associations between population-based determinants and HCV risk. Geographically weighted Poisson regression models were then constructed to determine local differences of the association between HCV risk and population-based determinants. HCV prevalence varied geographically and clustered in urban areas. The main population at risk were middle-aged males, non-western immigrants and divorced persons. Socio-economic determinants consisted of one-person households, persons with low income and mean property value. However, the association between HCV risk and demographic as well as socio-economic determinants displayed strong regional and intra-urban differences. The detection of local hotspots in our study may serve as a basis for prioritization of areas for future targeted interventions. Demographic and socio-economic determinants associated with HCV risk show regional differences underlining that a one-size-fits-all approach even within small geographic areas may not be appropriate. Future screening interventions need to consider the spatially varying association between HCV risk and associated demographic and socio-economic determinants.
Development of the Korean Adult Reading Test (KART) to estimate premorbid intelligence in dementia patients

PubMed Central

Seo, Eun Hyun; Han, Ji Young; Sohn, Bo Kyung; Byun, Min Soo; Lee, Jun Ho; Choe, Young Min; Ahn, Suzy; Woo, Jong Inn; Jun, Jongho; Lee, Dong Young

2017-01-01

We aimed to develop a word-reading test for Korean-speaking adults using irregularly pronounced words that would be useful for estimation of premorbid intelligence. A linguist who specialized in Korean phonology selected 94 words that have irregular relationship between orthography and phonology. Sixty cognitively normal elderly (CN) and 31 patients with Alzheimer’s disease (AD) were asked to read out loud the words and were administered the Wechsler Adult Intelligence Scale, 4th edition, Korean version (K-WAIS-IV). Among the 94 words, 50 words that did not show a significant difference between the CN and the AD group were selected and constituted the KART. Using the 30 CN calculation group (CNc), a linear regression equation was obtained in which the observed full-scale IQ (FSIQ) was regressed on the reading errors of the KART, where education was included as an additional variable. When the regressed equation computed from the CNc was applied to 30 CN individuals of the validation group (CNv), the predicted FSIQ adequately fit the observed FSIQ (R2 = 0.63). In addition, independent sample t-test showed that the KART-predicted IQs were not significantly different between the CNv and AD groups, whereas the performance of the AD group was significantly worse in the observed IQs. In addition, an extended validation of the KART was performed with a separate sample consisted of 84 CN, 56 elderly with mild cognitive impairment (MCI), and 43 AD patients who were administered comprehensive neuropsychological assessments in addition to the KART. When the equation obtained from the CNc was applied to the extended validation sample, the KART-predicted IQs of the AD, MCI and the CN groups did not significantly differ, whereas their current global cognition scores significantly differed between the groups. In conclusion, the results support the validity of KART-predicted IQ as an index of premorbid IQ in individuals with AD. PMID:28723964
The concept of psychological regression: metaphors, mapping, Queen Square, and Tavistock Square.

PubMed

Mercer, Jean

2011-05-01

The term "regression" refers to events in which an individual changes from his or her present level of maturity and regains mental and behavioral characteristics shown at an earlier point in development. This definition has remained constant for over a century, but the implications of the concept have changed systematically from a perspective in which regression was considered pathological, to a current view in which regression may be seen as a positive step in psychotherapy or as a part of normal development. The concept of regression, famously employed by Sigmund Freud and others in his circle, derived from ideas suggested by Herbert Spencer and by John Hughlings Jackson. By the 1940s and '50s, the regression concept was applied by Winnicott and others in treatment of disturbed children and in adult psychotherapy. In addition, behavioral regression came to be seen as a part of a normal developmental trajectory, with a focus on expectable variability. The present article examines historical changes in the regression concept in terms of mapping to biomedical or other metaphors, in terms of a movement from earlier nativism toward an increased environmentalism in psychology, and with respect to other historical factors such as wartime events. The role of dominant metaphors in shifting perspectives on regression is described.
Robust scaling laws for energy confinement time, including radiated fraction, in Tokamaks

NASA Astrophysics Data System (ADS)

Murari, A.; Peluso, E.; Gaudio, P.; Gelfusa, M.

2017-12-01

In recent years, the limitations of scalings in power-law form that are obtained from traditional log regression have become increasingly evident in many fields of research. Given the wide gap in operational space between present-day and next-generation devices, robustness of the obtained models in guaranteeing reasonable extrapolability is a major issue. In this paper, a new technique, called symbolic regression, is reviewed, refined, and applied to the ITPA database for extracting scaling laws of the energy-confinement time at different radiated fraction levels. The main advantage of this new methodology is its ability to determine the most appropriate mathematical form of the scaling laws to model the available databases without the restriction of their having to be power laws. In a completely new development, this technique is combined with the concept of geodesic distance on Gaussian manifolds so as to take into account the error bars in the measurements and provide more reliable models. Robust scaling laws, including radiated fractions as regressor, have been found; they are not in power-law form, and are significantly better than the traditional scalings. These scaling laws, including radiated fractions, extrapolate quite differently to ITER, and therefore they require serious consideration. On the other hand, given the limitations of the existing databases, dedicated experimental investigations will have to be carried out to fully understand the impact of radiated fractions on the confinement in metallic machines and in the next generation of devices.
Healthcare Utilization and Expenditures for Persons with Diabetes Comorbid with Mental Illnesses.

PubMed

Su, Chen-Hsiang; Chiu, Herng-Chia; Hsieh, Hui-Min; Yen, Ju-Yu; Lee, Mei-Hsuan; Li, Chih-Yi; Chang, Kao-Ping; Huang, Chun-Jen

2016-09-01

The aim of this study was to investigate healthcare utilization and expenditure for patients with diabetes comorbid with and without mental illnesses in Taiwan. People with diabetes comorbid with and without mental illnesses in 2000 were identified and followed up to 2004 to explore the healthcare utilization and expenditure. Healthcare utilization included outpatient visits and use of hospital inpatient services, and expenditure included outpatient, inpatient and total medical expenditure. General estimation equation models were used to explore the factors associated with outpatient visits and expenditure. To identify the factors associated with hospitalization, multiple logistic regressions were applied. The average number of annual outpatient visits of the patients with mental illnesses ranged from 37.01 to 41.91, and 28.83 to 31.79 times for the patients without mental illnesses from 2000 to 2004. The average annual total expenditure for patients with mental illnesses during this period ranged from NT$77,123-NT$90,790, and NT$60,793- NT$84,984 for those without mental illnesses. After controlling for covariates, the results indicated that gender, age, mental illness and time factor were associated with outpatient visits. Gender, age, and time factor were associated with total expenditure. Age and mental illness were associated with hospitalization in logistic regression. The healthcare utilization and expenditure for patients with mental illnesses was significantly higher than for patients without mental illnesses. The factors associated with healthcare utilization and expenditure included gender, age, mental illness and time trends.
Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature

PubMed Central

2011-01-01

Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies. PMID:21247440
Trainee-Associated Factors and Proficiency at Percutaneous Nephrolithotomy.

PubMed

Aghamir, Seyed Mohammad Kazem; Behtash, Negar; Hamidi, Morteza; Farahmand, Hasan; Salavati, Alborz; Mortaz Hejri, Sara

2017-07-01

Percutaneous nephrolithotomy (PNL) is a complicated procedure for urology trainees. This study was designed to investigate the effect of trainees' ages and previous experience, as well as the number of operated cases, on proficiency at PNL by using patient outcomes. A cross sectional observational study was designed during a five-year period. Trainees in PNL fellowship programs were included. At the end of the program, the trainees' performance in PNL was assessed regarding five competencies and scored 1-5. If the overall score was 4 or above, the trainee was considered as proficient. The trainees' age at the beginning of the program and the years passed from their residency graduation were asked and recorded. Also, the number of PNL cases operated by each trainee was obtained via their logbooks. The age, years passed from graduation, and number of operated cases were compared between two groups of proficient and non-proficient trainees. Univariate and multivariate binary logistic regression analysis was applied to estimate the effect of aforementioned variables on the occurrence of the proficiency. Forty-two trainees were included in the study. The mean and standard deviation for the overall score were 3.40 (out of 5) and 0.67, respectively. Eleven trainees (26.2%) recognized as proficient in performing PNL. Univariate regression analysis indicated that each of three variables (age, years passed from graduation and number of operated cases) had statistically significant effect on proficiency. However, the multivariate regression analysis revealed that just the number of cases had significant effect on achieving proficiency. Although it might be assumed that trainees' age negatively correlates with their scores, in fact, it is their amount of practice that makes a difference. A certain number of cases is required to be operated by a trainee in order to reach the desired competency in PNL.
[Stature estimation for Sichuan Han nationality female based on X-ray technology with measurement of lumbar vertebrae].

PubMed

Qing, Si-han; Chang, Yun-feng; Dong, Xiao-ai; Li, Yuan; Chen, Xiao-gang; Shu, Yong-kang; Deng, Zhen-hua

2013-10-01

To establish the mathematical models of stature estimation for Sichuan Han female with measurement of lumbar vertebrae by X-ray to provide essential data for forensic anthropology research. The samples, 206 Sichuan Han females, were divided into three groups including group A, B and C according to the ages. Group A (206 samples) consisted of all ages, group B (116 samples) were 20-45 years old and 90 samples over 45 years old were group C. All the samples were examined lumbar vertebrae through CR technology, including the parameters of five centrums (L1-L5) as anterior border, posterior border and central heights (x1-x15), total central height of lumbar spine (x16), and the real height of every sample. The linear regression analysis was produced using the parameters to establish the mathematical models of stature estimation. Sixty-two trained subjects were tested to verify the accuracy of the mathematical models. The established mathematical models by hypothesis test of linear regression equation model were statistically significant (P<0.05). The standard errors of the equation were 2.982-5.004 cm, while correlation coefficients were 0.370-0.779 and multiple correlation coefficients were 0.533-0.834. The return tests of the highest correlation coefficient and multiple correlation coefficient of each group showed that the highest accuracy of the multiple regression equation, y = 100.33 + 1.489 x3 - 0.548 x6 + 0.772 x9 + 0.058 x12 + 0.645 x15, in group A were 80.6% (+/- lSE) and 100% (+/- 2SE). The established mathematical models in this study could be applied for the stature estimation for Sichuan Han females.
[An investigation on job burnout of medical personnel in a top three hospital].

PubMed

Li, Y Y; Li, L P

2016-05-20

To investigate job burnout status of medical Personnel in a top three hospitals, in order to provide basic data for intervention of the hospital management. A total of 549 doctors and nurses were assessed by Maslach Burnout Inventory-Human Service Survey (MBI-HSS). SPSS 19.0 software package was applied to data description and analysis, including univariate analysis and orderly classification Logistic regression analysis. The rate of high job burnout of doctors and nurses are 36.3% and 42.8% respectively. Female subjects got higher scores (29.4±13.5) on emotional exhaustion than male subjects (26.2±12.8) compared with.Doctors got lower scores (28.2±15.9) on emotional exhaustion and higher scores (31.4±9.3) on personal accomplishment than nurses.Compared with subjects with higher professional title, young subjects with primary professional title got lower scores on personal accomplishment.Subjects with 11-20 years working age got the highest scores on depersonalization.Among all the test departments, medical personnel of emergency department got the highest scores (31.9±12.6) on emotional exhaustion,while the lowest scores (28.1±8.0) on personal accomplishment. According to the results of orderly classification Logistic regression analysis, age, job type,professional qualifications and clinical departments type entered the regression model. Physical resources and emotional resources of medical personnel are overdraft so that they got some high degree of job burnout.Much more attention should be paid to professional mental health of nurses,and personnel who at low age,got low professional titles.Positive measures should be provided, including management mechanism,organizational culture, occupational protection and psychological intervention.
Whole-genome regression and prediction methods applied to plant and animal breeding.

PubMed

de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L

2013-02-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

PubMed Central

de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228

Cuttability Assessment of Selected Rocks Through Different Brittleness Values

NASA Astrophysics Data System (ADS)

Dursun, Arif Emre; Gokay, M. Kemal

2016-04-01

Prediction of cuttability is a critical issue for successful execution of tunnel or mining excavation projects. Rock cuttability is also used to determine specific energy, which is defined as the work done by the cutting force to excavate a unit volume of yield. Specific energy is a meaningful inverse measure of cutting efficiency, since it simply states how much energy must be expended to excavate a unit volume of rock. Brittleness is a fundamental rock property and applied in drilling and rock excavation. Brittleness is one of the most crucial rock features for rock excavation. For this reason, determination of relations between cuttability and brittleness will help rock engineers. This study aims to estimate the specific energy from different brittleness values of rocks by means of simple and multiple regression analyses. In this study, rock cutting, rock property, and brittleness index tests were carried out on 24 different rock samples with different strength values, including marble, travertine, and tuff, collected from sites around Konya Province, Turkey. Four previously used brittleness concepts were evaluated in this study, denoted as B 1 (ratio of compressive to tensile strength), B 2 (ratio of the difference between compressive and tensile strength to the sum of compressive and tensile strength), B 3 (area under the stress-strain line in relation to compressive and tensile strength), and B 9 = S 20, the percentage of fines (<11.2 mm) formed in an impact test for the Norwegian University of Science and Technology (NTNU) model as well as B 9p (B 9 as predicted from uniaxial compressive, Brazilian tensile, and point load strengths of rocks using multiple regression analysis). The results suggest that the proposed simple regression-based prediction models including B 3, B 9, and B 9p outperform the other models including B 1 and B 2 and can be used for more accurate and reliable estimation of specific energy.
The Mycotic Ulcer Treatment Trial

PubMed Central

Prajna, N. Venkatesh; Krishnan, Tiruvengada; Mascarenhas, Jeena; Rajaraman, Revathi; Prajna, Lalitha; Srinivasan, Muthiah; Raghavan, Anita; Oldenburg, Catherine E.; Ray, Kathryn J.; Zegans, Michael E.; McLeod, Stephen D.; Porco, Travis C.; Acharya, Nisha R.; Lietman, Thomas M.

2013-01-01

Objective To compare topical natamycin vs voriconazole in the treatment of filamentous fungal keratitis. Methods This phase 3, double-masked, multicenter trial was designed to randomize 368 patients to voriconazole (1%) or natamycin (5%), applied topically every hour while awake until reepithelialization, then 4 times daily for at least 3 weeks. Eligibility included smear-positive filamentous fungal ulcer and visual acuity of 20/40 to 20/400. Main Outcome Measures The primary outcome was best spectacle-corrected visual acuity at 3 months; secondary outcomes included corneal perforation and/or therapeutic penetrating keratoplasty. Results A total of 940 patients were screened and 323 were enrolled. Causative organisms included Fusarium (128 patients [40%]), Aspergillus (54 patients [17%]), and other filamentous fungi (141 patients [43%]). Natamycin-treated cases had significantly better 3-month best spectacle-corrected visual acuity than voriconazole-treated cases (regression coefficient=−0.18 logMAR; 95% CI, −0.30 to −0.05; P=.006). Natamycin-treated cases were less likely to have perforation or require therapeutic penetrating keratoplasty (odds ratio=0.42; 95% CI, 0.22 to 0.80; P=.009). Fusarium cases fared better with natamycin than with voriconazole (regression coefficient=−0.41 logMAR; 95% CI, −0.61 to −0.20; P<.001; odds ratio for perforation=0.06; 95% CI, 0.01 to 0.28; P<.001), while non-Fusarium cases fared similarly (regression coefficient=−0.02 logMAR; 95% CI, −0.17 to 0.13; P=.81; odds ratio for perforation=1.08; 95% CI, 0.48 to 2.43; P=.86). Conclusions Natamycin treatment was associated with significantly better clinical and microbiological outcomes than voriconazole treatment for smear-positive filamentous fungal keratitis, with much of the difference attributable to improved results in Fusarium cases. Application to Clinical Practice Voriconazole should not be used as monotherapy in filamentous keratitis. Trial Registration clinicaltrials.gov Identifier: NCT00996736 PMID:23710492
History of falls, gait, balance, and fall risks in older cancer survivors living in the community.

PubMed

Huang, Min H; Shilling, Tracy; Miller, Kara A; Smith, Kristin; LaVictoire, Kayle

2015-01-01

Older cancer survivors may be predisposed to falls because cancer-related sequelae affect virtually all body systems. The use of a history of falls, gait speed, and balance tests to assess fall risks remains to be investigated in this population. This study examined the relationship of previous falls, gait, and balance with falls in community-dwelling older cancer survivors. At the baseline, demographics, health information, and the history of falls in the past year were obtained through interviewing. Participants performed tests including gait speed, Balance Evaluation Systems Test, and short-version of Activities-specific Balance Confidence scale. Falls were tracked by mailing of monthly reports for 6 months. A "faller" was a person with ≥1 fall during follow-up. Univariate analyses, including independent sample t-tests and Fisher's exact tests, compared baseline demographics, gait speed, and balance between fallers and non-fallers. For univariate analyses, Bonferroni correction was applied for multiple comparisons. Baseline variables with P<0.15 were included in a forward logistic regression model to identify factors predictive of falls with age as covariate. Sensitivity and specificity of each predictor of falls in the model were calculated. Significance level for the regression analysis was P<0.05. During follow-up, 59% of participants had one or more falls. Baseline demographics, health information, history of falls, gaits speed, and balance tests did not differ significantly between fallers and non-fallers. Forward logistic regression revealed that a history of falls was a significant predictor of falls in the final model (odds ratio =6.81; 95% confidence interval =1.594-29.074) (P<0.05). Sensitivity and specificity for correctly identifying a faller using the positive history of falls were 74% and 69%, respectively. Current findings suggested that for community-dwelling older cancer survivors with mixed diagnoses, asking about the history of falls may help detect individuals at risk of falling.
History of falls, gait, balance, and fall risks in older cancer survivors living in the community

PubMed Central

Huang, Min H; Shilling, Tracy; Miller, Kara A; Smith, Kristin; LaVictoire, Kayle

2015-01-01

Older cancer survivors may be predisposed to falls because cancer-related sequelae affect virtually all body systems. The use of a history of falls, gait speed, and balance tests to assess fall risks remains to be investigated in this population. This study examined the relationship of previous falls, gait, and balance with falls in community-dwelling older cancer survivors. At the baseline, demographics, health information, and the history of falls in the past year were obtained through interviewing. Participants performed tests including gait speed, Balance Evaluation Systems Test, and short-version of Activities-specific Balance Confidence scale. Falls were tracked by mailing of monthly reports for 6 months. A “faller” was a person with ≥1 fall during follow-up. Univariate analyses, including independent sample t-tests and Fisher’s exact tests, compared baseline demographics, gait speed, and balance between fallers and non-fallers. For univariate analyses, Bonferroni correction was applied for multiple comparisons. Baseline variables with P<0.15 were included in a forward logistic regression model to identify factors predictive of falls with age as covariate. Sensitivity and specificity of each predictor of falls in the model were calculated. Significance level for the regression analysis was P<0.05. During follow-up, 59% of participants had one or more falls. Baseline demographics, health information, history of falls, gaits speed, and balance tests did not differ significantly between fallers and non-fallers. Forward logistic regression revealed that a history of falls was a significant predictor of falls in the final model (odds ratio =6.81; 95% confidence interval =1.594–29.074) (P<0.05). Sensitivity and specificity for correctly identifying a faller using the positive history of falls were 74% and 69%, respectively. Current findings suggested that for community-dwelling older cancer survivors with mixed diagnoses, asking about the history of falls may help detect individuals at risk of falling. PMID:26425079
A refined method for multivariate meta-analysis and meta-regression

PubMed Central

Jackson, Daniel; Riley, Richard D

2014-01-01

Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Direct Breakthrough Curve Prediction From Statistics of Heterogeneous Conductivity Fields

NASA Astrophysics Data System (ADS)

Hansen, Scott K.; Haslauer, Claus P.; Cirpka, Olaf A.; Vesselinov, Velimir V.

2018-01-01

This paper presents a methodology to predict the shape of solute breakthrough curves in heterogeneous aquifers at early times and/or under high degrees of heterogeneity, both cases in which the classical macrodispersion theory may not be applicable. The methodology relies on the observation that breakthrough curves in heterogeneous media are generally well described by lognormal distributions, and mean breakthrough times can be predicted analytically. The log-variance of solute arrival is thus sufficient to completely specify the breakthrough curves, and this is calibrated as a function of aquifer heterogeneity and dimensionless distance from a source plane by means of Monte Carlo analysis and statistical regression. Using the ensemble of simulated groundwater flow and solute transport realizations employed to calibrate the predictive regression, reliability estimates for the prediction are also developed. Additional theoretical contributions include heuristics for the time until an effective macrodispersion coefficient becomes applicable, and also an expression for its magnitude that applies in highly heterogeneous systems. It is seen that the results here represent a way to derive continuous time random walk transition distributions from physical considerations rather than from empirical field calibration.
Imaging genetics approach to predict progression of Parkinson's diseases.

PubMed

Mansu Kim; Seong-Jin Son; Hyunjin Park

2017-07-01

Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Online EEG artifact removal for BCI applications by adaptive spatial filtering.

PubMed

Guarnieri, Roberto; Marino, Marco; Barban, Federico; Ganzetti, Marco; Mantini, Dante

2018-06-28

The performance of brain computer interfaces (BCIs) based on electroencephalography (EEG) data strongly depends on the effective attenuation of artifacts that are mixed in the recordings. To address this problem, we have developed a novel online EEG artifact removal method for BCI applications, which combines blind source separation (BSS) and regression (REG) analysis. The BSS-REG method relies on the availability of a calibration dataset of limited duration for the initialization of a spatial filter using BSS. Online artifact removal is implemented by dynamically adjusting the spatial filter in the actual experiment, based on a linear regression technique. Our results showed that the BSS-REG method is capable of attenuating different kinds of artifacts, including ocular and muscular, while preserving true neural activity. Thanks to its low computational requirements, BSS-REG can be applied to low-density as well as high-density EEG data. We argue that BSS-REG may enable the development of novel BCI applications requiring high-density recordings, such as source-based neurofeedback and closed-loop neuromodulation. © 2018 IOP Publishing Ltd.
Is the Health of the Nation Outcome Scales appropriate for the assessment of symptom severity in patients with substance-related disorders?

PubMed

Andreas, Sylke; Harries-Hedder, Karin; Schwenk, Wolfgang; Hausberg, Maria; Koch, Uwe; Schulz, Holger

2010-07-01

The Health of the Nation Outcome Scales (HoNOS) is an internationally established clinician-rated instrument. The aim of the study was to assess the psychometric properties in inpatients with substance-related disorders. The HoNOS was applied in a multicenter, consecutive sample of 417 inpatients. Interrater reliability coefficients, confirmatory factor analysis, and regression tree analyses were calculated to assess the reliability and validity of the HoNOS. The factor validity of the HoNOS and its total score could not be confirmed. After training, all items of the HoNOS revealed sufficient values of interrater reliabilities. As the results of the regression tree analyses showed, the single items of the HoNOS were one of the most important predictor of service utilization. The HoNOS can be recommended for obtaining detailed ratings of the problems of inpatients with substance-related disorders as a clinical application in routine mental health care at present. Further studies should include comparisons of HoNOS and Addiction Severity Index. Copyright 2010 Elsevier Inc. All rights reserved.
Double-time correlation functions of two quantum operations in open systems

NASA Astrophysics Data System (ADS)

Ban, Masashi

2017-10-01

A double-time correlation function of arbitrary two quantum operations is studied for a nonstationary open quantum system which is in contact with a thermal reservoir. It includes a usual correlation function, a linear response function, and a weak value of an observable. Time evolution of the correlation function can be derived by means of the time-convolution and time-convolutionless projection operator techniques. For this purpose, a quasidensity operator accompanied by a fictitious field is introduced, which makes it possible to derive explicit formulas for calculating a double-time correlation function in the second-order approximation with respect to a system-reservoir interaction. The derived formula explicitly shows that the quantum regression theorem for calculating the double-time correlation function cannot be used if a thermal reservoir has a finite correlation time. Furthermore, the formula is applied for a pure dephasing process and a linear dissipative process. The quantum regression theorem and the the Leggett-Garg inequality are investigated for an open two-level system. The results are compared with those obtained by exact calculation to examine whether the formula is a good approximation.
Peak-flow characteristics of Wyoming streams

USGS Publications Warehouse

Miller, Kirk A.

2003-01-01

Peak-flow characteristics for unregulated streams in Wyoming are described in this report. Frequency relations for annual peak flows through water year 2000 at 364 streamflow-gaging stations in and near Wyoming were evaluated and revised or updated as needed. Analyses of historical floods, temporal trends, and generalized skew were included in the evaluation. Physical and climatic basin characteristics were determined for each gaging station using a geographic information system. Gaging stations with similar peak-flow and basin characteristics were grouped into six hydrologic regions. Regional statistical relations between peak-flow and basin characteristics were explored using multiple-regression techniques. Generalized least squares regression equations for estimating magnitudes of annual peak flows with selected recurrence intervals from 1.5 to 500 years were developed for each region. Average standard errors of estimate range from 34 to 131 percent. Average standard errors of prediction range from 35 to 135 percent. Several statistics for evaluating and comparing the errors in these estimates are described. Limitations of the equations are described. Methods for applying the regional equations for various circumstances are listed and examples are given.
[Domestic violence during pregnancy and its relationship with birth weight].

PubMed

Valdez-Santiago, R; Sanín-Aguirre, L H

1996-01-01

To determine the prevalence of domestic violence during pregnancy and its impact on birth weight and the immediate post-partum period. We conducted a survey of 110 pregnant women who delivered at the Hospital Civil in Cuernavaca, Morelos. The questionnaire was applied by specialized personal. We used multiple linear regression to adjust for differences between birth weight means and multiple logistic regression for complications. In our study, women who suffered violence during pregnancy had three times more complications during delivery (Cl 95% 1.3-7.9). The difference in birth weight of newborns of these women was 560 g less (p < 0.01 adjusted by age and parity) in comparison with women who did not undergo violence during pregnancy. Women who suffered violence during pregnancy had a four times greater risk for having low birth weight babies (Cl 95% 1.3-12.3) than the non-battered women. We propose more research be done on this topic, including studies of other population groups. Also, health personnel should be educated that violence towards women could constitute a reproductive risk.
Technical note: Fu-Liou-Gu and Corti-Peter model performance evaluation for radiative retrievals from cirrus clouds

NASA Astrophysics Data System (ADS)

Lolli, Simone; Campbell, James R.; Lewis, Jasper R.; Gu, Yu; Welton, Ellsworth J.

2017-06-01

We compare, for the first time, the performance of a simplified atmospheric radiative transfer algorithm package, the Corti-Peter (CP) model, versus the more complex Fu-Liou-Gu (FLG) model, for resolving top-of-the-atmosphere radiative forcing characteristics from single-layer cirrus clouds obtained from the NASA Micro-Pulse Lidar Network database in 2010 and 2011 at Singapore and in Greenbelt, Maryland, USA, in 2012. Specifically, CP simplifies calculation of both clear-sky longwave and shortwave radiation through regression analysis applied to radiative calculations, which contributes significantly to differences between the two. The results of the intercomparison show that differences in annual net top-of-the-atmosphere (TOA) cloud radiative forcing can reach 65 %. This is particularly true when land surface temperatures are warmer than 288 K, where the CP regression analysis becomes less accurate. CP proves useful for first-order estimates of TOA cirrus cloud forcing, but may not be suitable for quantitative accuracy, including the absolute sign of cirrus cloud daytime TOA forcing that can readily oscillate around zero globally.
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

PubMed Central

Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

2009-01-01

We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190
Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.

PubMed

Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie

2016-04-01

Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
Spectral regression and correlation coefficients of some benzaldimines and salicylaldimines in different solvents

NASA Astrophysics Data System (ADS)

Hammud, Hassan H.; Ghannoum, Amer; Masoud, Mamdouh S.

2006-02-01

Sixteen Schiff bases obtained from the condensation of benzaldehyde or salicylaldehyde with various amines (aniline, 4-carboxyaniline, phenylhydrazine, 2,4-dinitrophenylhydrazine, ethylenediamine, hydrazine, o-phenylenediamine and 2,6-pyridinediamine) are studied with UV-vis spectroscopy to observe the effect of solvents, substituents and other structural factors on the spectra. The bands involving different electronic transitions are interpreted. Computerized analysis and multiple regression techniques were applied to calculate the regression and correlation coefficients based on the equation that relates peak position λmax to the solvent parameters that depend on the H-bonding ability, refractive index and dielectric constant of solvents.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction

NASA Astrophysics Data System (ADS)

Pan, Bingyue

2018-02-01

In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Quantile regression in the presence of monotone missingness with sensitivity analysis

PubMed Central

Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.

2016-01-01

In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008
Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle.

PubMed

Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

2016-12-01

The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.
Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle

PubMed Central

Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

2016-01-01

The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192

Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

PubMed

Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

2017-06-14

Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
Advances in simultaneous atmospheric profile and cloud parameter regression based retrieval from high-spectral resolution radiance measurements

NASA Astrophysics Data System (ADS)

Weisz, Elisabeth; Smith, William L.; Smith, Nadia

2013-06-01

The dual-regression (DR) method retrieves information about the Earth surface and vertical atmospheric conditions from measurements made by any high-spectral resolution infrared sounder in space. The retrieved information includes temperature and atmospheric gases (such as water vapor, ozone, and carbon species) as well as surface and cloud top parameters. The algorithm was designed to produce a high-quality product with low latency and has been demonstrated to yield accurate results in real-time environments. The speed of the retrieval is achieved through linear regression, while accuracy is achieved through a series of classification schemes and decision-making steps. These steps are necessary to account for the nonlinearity of hyperspectral retrievals. In this work, we detail the key steps that have been developed in the DR method to advance accuracy in the retrieval of nonlinear parameters, specifically cloud top pressure. The steps and their impact on retrieval results are discussed in-depth and illustrated through relevant case studies. In addition to discussing and demonstrating advances made in addressing nonlinearity in a linear geophysical retrieval method, advances toward multi-instrument geophysical analysis by applying the DR to three different operational sounders in polar orbit are also noted. For any area on the globe, the DR method achieves consistent accuracy and precision, making it potentially very valuable to both the meteorological and environmental user communities.
Effects of integration time on in-water radiometric profiles.

PubMed

D'Alimonte, Davide; Zibordi, Giuseppe; Kajiyama, Tamito

2018-03-05

This work investigates the effects of integration time on in-water downward irradiance E d , upward irradiance E u and upwelling radiance L u profile data acquired with free-fall hyperspectral systems. Analyzed quantities are the subsurface value and the diffuse attenuation coefficient derived by applying linear and non-linear regression schemes. Case studies include oligotrophic waters (Case-1), as well as waters dominated by Colored Dissolved Organic Matter (CDOM) and Non-Algal Particles (NAP). Assuming a 24-bit digitization, measurements resulting from the accumulation of photons over integration times varying between 8 and 2048ms are evaluated at depths corresponding to: 1) the beginning of each integration interval (Fst); 2) the end of each integration interval (Lst); 3) the averages of Fst and Lst values (Avg); and finally 4) the values weighted accounting for the diffuse attenuation coefficient of water (Wgt). Statistical figures show that the effects of integration time can bias results well above 5% as a function of the depth definition. Results indicate the validity of the Wgt depth definition and the fair applicability of the Avg one. Instead, both the Fst and Lst depths should not be adopted since they may introduce pronounced biases in E u and L u regression products for highly absorbing waters. Finally, the study reconfirms the relevance of combining multiple radiometric casts into a single profile to increase precision of regression products.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

PubMed

Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

2013-08-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

PubMed Central

Kim, Yoonsang; Emery, Sherry

2013-01-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Early warnings for suicide attempt among Chinese rural population.

PubMed

Lyu, Juncheng; Wang, Yingying; Shi, Hong; Zhang, Jie

2018-06-05

This study was to explore the main influencing factors of attempted suicide and establish an early warning model, so as to put forward prevention strategies for attempted suicide. Data came from a large-scale case-control epidemiological survey. A sample of 659 serious suicide attempters was randomly recruited from 13 rural counties in China. Each case was matched by a community control for gender, age, and residence location. Face to face interviews were conducted for all the cases and controls with the same structured questionnaire. Univariate logistic regression was applied to screen the factors and multivariate logistic regression was used to excavate the predictors. There were no statistical differences between suicide attempters and the community controls in gender, age, and residence location. The Cronbach`s coefficients for all the scales used were above 0.675. The multivariate logistic regressions have revealed 12 statistically significant variables predicting attempted suicide, including less education, family history of suicide, poor health, mental problem, aspiration strain, hopelessness, impulsivity, depression, negative life events. On the other hand, social support, coping skills, and healthy community protected the rural residents from suicide attempt. The excavated warning predictors are significant clinical meaning for the clinical psychiatrist. Crisis intervention strategies in rural China should be informed by the findings from this research. Education, social support, healthy community, and strain reduction are all measures to decrease the likelihood of crises. Copyright © 2018. Published by Elsevier B.V.
Ceiling Fires Studied to Simulate Low-Gravity Fires

NASA Technical Reports Server (NTRS)

Olson, Sandra L.

2001-01-01

A unique new way to study low-gravity flames in normal gravity has been developed. To study flame structure and extinction characteristics in low-stretch environments, a normal gravity low-stretch diffusion flame was generated using a cylindrical PMMA sample of varying large radii, as shown in the photograph. These experiments have demonstrated that low-gravity flame characteristics can be generated in normal gravity through the proper use of scaling. On the basis of this work, it is feasible to apply this concept toward the development of an Earth-bound method of evaluating material flammability in various gravitational environments from normal gravity to microgravity, including the effects of partial gravity low-stretch rates such as those found on the Moon (1/6g) or Mars (1/3g). During these experiments, the surface regression rates for PMMA were measured for the first time over the full range of flammability in air, from blowoff at high stretch, to quenching at low stretch, as plotted in the graph. The solid line drawn through the central portion of the data (3
Comparison of exact, efron and breslow parameter approach method on hazard ratio and stratified cox regression model

NASA Astrophysics Data System (ADS)

Fatekurohman, Mohamat; Nurmala, Nita; Anggraeni, Dian

2018-04-01

Lungs are the most important organ, in the case of respiratory system. Problems related to disorder of the lungs are various, i.e. pneumonia, emphysema, tuberculosis and lung cancer. Comparing all those problems, lung cancer is the most harmful. Considering about that, the aim of this research applies survival analysis and factors affecting the endurance of the lung cancer patient using comparison of exact, Efron and Breslow parameter approach method on hazard ratio and stratified cox regression model. The data applied are based on the medical records of lung cancer patients in Jember Paru-paru hospital on 2016, east java, Indonesia. The factors affecting the endurance of the lung cancer patients can be classified into several criteria, i.e. sex, age, hemoglobin, leukocytes, erythrocytes, sedimentation rate of blood, therapy status, general condition, body weight. The result shows that exact method of stratified cox regression model is better than other. On the other hand, the endurance of the patients is affected by their age and the general conditions.
[New method of mixed gas infrared spectrum analysis based on SVM].

PubMed

Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua

2007-07-01

A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
The application of artificial neural networks and support vector regression for simultaneous spectrophotometric determination of commercial eye drop contents

NASA Astrophysics Data System (ADS)

Valizadeh, Maryam; Sohrabi, Mahmoud Reza

2018-03-01

In the present study, artificial neural networks (ANNs) and support vector regression (SVR) as intelligent methods coupled with UV spectroscopy for simultaneous quantitative determination of Dorzolamide (DOR) and Timolol (TIM) in eye drop. Several synthetic mixtures were analyzed for validating the proposed methods. At first, neural network time series, which one type of network from the artificial neural network was employed and its efficiency was evaluated. Afterwards, the radial basis network was applied as another neural network. Results showed that the performance of this method is suitable for predicting. Finally, support vector regression was proposed to construct the Zilomole prediction model. Also, root mean square error (RMSE) and mean recovery (%) were calculated for SVR method. Moreover, the proposed methods were compared to the high-performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them. Also, the effect of interferences was investigated in spike solutions.
Covariate Selection for Multilevel Models with Missing Data

PubMed Central

Marino, Miguel; Buxton, Orfeu M.; Li, Yi

2017-01-01

Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
Simulated peak inflows for glacier dammed Russell Fiord, near Yakutat, Alaska

USGS Publications Warehouse

Neal, Edward G.

2004-01-01

In June 2002, Hubbard Glacier advanced across the entrance to 35-mile-long Russell Fiord creating a glacier-dammed lake. After closure of the ice and moraine dam, runoff from mountain streams and glacial melt caused the level in ?Russell Lake? to rise until it eventually breached the dam on August 14, 2002. Daily mean inflows to the lake during the period of closure were estimated on the basis of lake stage data and the hypsometry of Russell Lake. Inflows were regressed against the daily mean streamflows of nearby Ophir Creek and Situk River to generate an equation for simulating Russell Lake inflow. The regression equation was used to produce 11 years of synthetic daily inflows to Russell Lake for the 1992-2002 water years. A flood-frequency analysis was applied to the peak daily mean inflows for these 11 years of record to generate a 100-year peak daily mean inflow of 235,000 cubic feet per second. Regional-regression equations also were applied to the Russell Lake basin, yielding a 100-year inflow of 157,000 cubic feet per second.
Unsteady hovering wake parameters identified from dynamic model tests, part 1

NASA Technical Reports Server (NTRS)

Hohenemser, K. H.; Crews, S. T.

1977-01-01

The development of a 4-bladed model rotor is reported that can be excited with a simple eccentric mechanism in progressing and regressing modes with either harmonic or transient inputs. Parameter identification methods were applied to the problem of extracting parameters for linear perturbation models, including rotor dynamic inflow effects, from the measured blade flapping responses to transient pitch stirring excitations. These perturbation models were then used to predict blade flapping response to other pitch stirring transient inputs, and rotor wake and blade flapping responses to harmonic inputs. The viability and utility of using parameter identification methods for extracting the perturbation models from transients are demonstrated through these combined analytical and experimental studies.
Users manual for flight control design programs

NASA Technical Reports Server (NTRS)

Nalbandian, J. Y.

1975-01-01

Computer programs for the design of analog and digital flight control systems are documented. The program DIGADAPT uses linear-quadratic-gaussian synthesis algorithms in the design of command response controllers and state estimators, and it applies covariance propagation analysis to the selection of sampling intervals for digital systems. Program SCHED executes correlation and regression analyses for the development of gain and trim schedules to be used in open-loop explicit-adaptive control laws. A linear-time-varying simulation of aircraft motions is provided by the program TVHIS, which includes guidance and control logic, as well as models for control actuator dynamics. The programs are coded in FORTRAN and are compiled and executed on both IBM and CDC computers.
Spatial-temporal event detection in climate parameter imagery.

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKenna, Sean Andrew; Gutierrez, Karen A.

Previously developed techniques that comprise statistical parametric mapping, with applications focused on human brain imaging, are examined and tested here for new applications in anomaly detection within remotely-sensed imagery. Two approaches to analysis are developed: online, regression-based anomaly detection and conditional differences. These approaches are applied to two example spatial-temporal data sets: data simulated with a Gaussian field deformation approach and weekly NDVI images derived from global satellite coverage. Results indicate that anomalies can be identified in spatial temporal data with the regression-based approach. Additionally, la Nina and el Nino climatic conditions are used as different stimuli applied to themore » earth and this comparison shows that el Nino conditions lead to significant decreases in NDVI in both the Amazon Basin and in Southern India.« less
A Pre-Screening Questionnaire to Predict Non-24-Hour Sleep-Wake Rhythm Disorder (N24HSWD) among the Blind

PubMed Central

Flynn-Evans, Erin E.; Lockley, Steven W.

2016-01-01

Study Objectives: There is currently no questionnaire-based pre-screening tool available to detect non-24-hour sleep-wake rhythm disorder (N24HSWD) among blind patients. Our goal was to develop such a tool, derived from gold standard, objective hormonal measures of circadian entrainment status, for the detection of N24HSWD among those with visual impairment. Methods: We evaluated the contribution of 40 variables in their ability to predict N24HSWD among 127 blind women, classified using urinary 6-sulfatoxymelatonin period, an objective marker of circadian entrainment status in this population. We subjected the 40 candidate predictors to 1,000 bootstrapped iterations of a logistic regression forward selection model to predict N24HSWD, with model inclusion set at the p < 0.05 level. We removed any predictors that were not selected at least 1% of the time in the 1,000 bootstrapped models and applied a second round of 1,000 bootstrapped logistic regression forward selection models to the remaining 23 candidate predictors. We included all questions that were selected at least 10% of the time in the final model. We subjected the selected predictors to a final logistic regression model to predict N24SWD over 1,000 bootstrapped models to calculate the concordance statistic and adjusted optimism of the final model. We used this information to generate a predictive model and determined the sensitivity and specificity of the model. Finally, we applied the model to a cohort of 1,262 blind women who completed the survey, but did not collect urine samples. Results: The final model consisted of eight questions. The concordance statistic, adjusted for bootstrapping, was 0.85. The positive predictive value was 88%, the negative predictive value was 79%. Applying this model to our larger dataset of women, we found that 61% of those without light perception, and 27% with some degree of light perception, would be referred for further screening for N24HSWD. Conclusions: Our model has predictive utility sufficient to serve as a pre-screening questionnaire for N24HSWD among the blind. Citation: Flynn-Evans EE, Lockley SW. A pre-screening questionnaire to predict non-24-hour sleep-wake rhythm disorder (N24HSWD) among the blind. J Clin Sleep Med 2016;12(5):703–710. PMID:26951421
Bilateral Coats' Disease Combined with Retinopathy of Prematurity

PubMed Central

Gursoy, Huseyin; Erol, Nazmiye; Bilgec, Mustafa Deger; Basmak, Hikmet; Kutlay, Ozden; Aslan, Huseyin

2015-01-01

Purpose. To report a case of bilateral Coats' disease combined with retinopathy of prematurity (ROP). Case. Retinal vascularization was complete in the right eye, whereas zone III, stage 3 ROP and preplus disease were observed in the left eye at 43 weeks of postmenstrual age (PMA) in a 31-week premature, 1200-g neonate. Intraretinal exudates developed and retinal hemorrhages increased in the left eye at 51 weeks of PMA. Diode laser photocoagulation (LP) was applied to the left eye. Exudates involved the macula, and telangiectatic changes developed one month following LP. Additional LP was applied to the left eye combined with intravitreal bevacizumab (IVB) injection at 55 weeks of PMA. Disease regressed one month after the additional therapy. At the 14-month examination of the baby, telangiectatic changes and intraretinal exudates were observed in the right eye. Diode LP was applied to the right eye combined with IVB injection. Exudates did not resolve completely, and cryotherapy was applied one month following LP. Retinal findings regressed three months following the cryotherapy. Conclusion. This is the first report of presumed bilateral Coats' disease combined with ROP. If Coats' disease could be diagnosed at early stages, it would be a disease associated with better prognosis. PMID:26413362
Lung nodule malignancy prediction using multi-task convolutional neural network

NASA Astrophysics Data System (ADS)

Li, Xiuli; Kao, Yueying; Shen, Wei; Li, Xiang; Xie, Guotong

2017-03-01

In this paper, we investigated the problem of diagnostic lung nodule malignancy prediction using thoracic Computed Tomography (CT) screening. Unlike most existing studies classify the nodules into two types benign and malignancy, we interpreted the nodule malignancy prediction as a regression problem to predict continuous malignancy level. We proposed a joint multi-task learning algorithm using Convolutional Neural Network (CNN) to capture nodule heterogeneity by extracting discriminative features from alternatingly stacked layers. We trained a CNN regression model to predict the nodule malignancy, and designed a multi-task learning mechanism to simultaneously share knowledge among 9 different nodule characteristics (Subtlety, Calcification, Sphericity, Margin, Lobulation, Spiculation, Texture, Diameter and Malignancy), and improved the final prediction result. Each CNN would generate characteristic-specific feature representations, and then we applied multi-task learning on the features to predict the corresponding likelihood for that characteristic. We evaluated the proposed method on 2620 nodules CT scans from LIDC-IDRI dataset with the 5-fold cross validation strategy. The multitask CNN regression result for regression RMSE and mapped classification ACC were 0.830 and 83.03%, while the results for single task regression RMSE 0.894 and mapped classification ACC 74.9%. Experiments show that the proposed method could predict the lung nodule malignancy likelihood effectively and outperforms the state-of-the-art methods. The learning framework could easily be applied in other anomaly likelihood prediction problem, such as skin cancer and breast cancer. It demonstrated the possibility of our method facilitating the radiologists for nodule staging assessment and individual therapeutic planning.
A new approach for continuous estimation of baseflow using discrete water quality data: Method description and comparison with baseflow estimates from two existing approaches

USGS Publications Warehouse

Miller, Matthew P.; Johnson, Henry M.; Susong, David D.; Wolock, David M.

2015-01-01

Understanding how watershed characteristics and climate influence the baseflow component of stream discharge is a topic of interest to both the scientific and water management communities. Therefore, the development of baseflow estimation methods is a topic of active research. Previous studies have demonstrated that graphical hydrograph separation (GHS) and conductivity mass balance (CMB) methods can be applied to stream discharge data to estimate daily baseflow. While CMB is generally considered to be a more objective approach than GHS, its application across broad spatial scales is limited by a lack of high frequency specific conductance (SC) data. We propose a new method that uses discrete SC data, which are widely available, to estimate baseflow at a daily time step using the CMB method. The proposed approach involves the development of regression models that relate discrete SC concentrations to stream discharge and time. Regression-derived CMB baseflow estimates were more similar to baseflow estimates obtained using a CMB approach with measured high frequency SC data than were the GHS baseflow estimates at twelve snowmelt dominated streams and rivers. There was a near perfect fit between the regression-derived and measured CMB baseflow estimates at sites where the regression models were able to accurately predict daily SC concentrations. We propose that the regression-derived approach could be applied to estimate baseflow at large numbers of sites, thereby enabling future investigations of watershed and climatic characteristics that influence the baseflow component of stream discharge across large spatial scales.
Advances in Non-Destructive Early Assessment of Fruit Ripeness towards Defining Optimal Time of Harvest and Yield Prediction—A Review

PubMed Central

Lecourt, Julien; Bishop, Gerard

2018-01-01

Global food security for the increasing world population not only requires increased sustainable production of food but a significant reduction in pre- and post-harvest waste. The timing of when a fruit is harvested is critical for reducing waste along the supply chain and increasing fruit quality for consumers. The early in-field assessment of fruit ripeness and prediction of the harvest date and yield by non-destructive technologies have the potential to revolutionize farming practices and enable the consumer to eat the tastiest and freshest fruit possible. A variety of non-destructive techniques have been applied to estimate the ripeness or maturity but not all of them are applicable for in situ (field or glasshouse) assessment. This review focuses on the non-destructive methods which are promising for, or have already been applied to, the pre-harvest in-field measurements including colorimetry, visible imaging, spectroscopy and spectroscopic imaging. Machine learning and regression models used in assessing ripeness are also discussed. PMID:29320410

Co-Occurring Psychosocial Problems and HIV Risk Among Women Attending Drinking Venues in a South African Township: A Syndemic Approach

PubMed Central

Pitpitan, Eileen V.; Kalichman, Seth C.; Eaton, Lisa A.; Cain, Demetria; Sikkema, Kathleen J.; Watt, Melissa H.; Skinner, Donald; Pieterse, Desiree

2012-01-01

Background In South Africa, women comprise the majority of HIV infections. Syndemics, or co-occurring epidemics and risk factors, have been applied to understanding HIV risk among marginalized groups. Purpose To apply the syndemic framework to examine psychosocial problems that co-occur among women attending drinking venues in South Africa, and to test how the co-occurrence of these problems may exacerbate risk for HIV infection. Method 560 women from a Cape Town township provided data on multiple psychosocial problems, including food insufficiency, depression, abuse experiences, problem drinking, and sexual behaviors. Results Bivariate associations among the syndemic factors showed a high degree of co-occurrence and regression analyses showed an additive effect of psychosocial problems on HIV risk behaviors. Conclusions These results demonstrate the utility of a syndemic framework to understand co-occurring psychosocial problems among women in South Africa. HIV prevention interventions should consider the compounding effects of psychosocial problems among women. PMID:23054944
Effect of epidural anaesthesia on clinician-applied force during vaginal delivery.

PubMed

Poggi, Sarah H; Allen, Robert H; Patel, Chirag; Deering, Shad H; Pezzullo, John C; Shin, Young; Spong, Catherine Y

2004-09-01

Epidural anesthesia (EA) is used in 80% of vaginal deliveries and is linked to neonatal and maternal trauma. Our objectives were to determine (1) whether EA affected clinician-applied force on the fetus and (2) whether this force influenced perineal trauma. After informed consent, multiparas with term, cephalic, singletons were delivered by 1 physician wearing a sensor-equipped glove to record force exerted on the fetal head. Those with EA were compared with those without for delivery force parameters. Regression analysis was used to identify predictors of vaginal laceration. The force required for delivery was greater in patients with EA (n = 27) than without (n = 5) (P < .01). Clinical parameters, including birth weight (P = .31) were similar between the groups. Clinician force was similar in those with no versus first- versus second-degree laceration (P = .5). Only birth weight was predictive of laceration (P = .02). Epidural use resulted in greater clinician force required for vaginal delivery of the fetus in multiparas, but this force was not associated with perineal trauma.
Failure detection and fault management techniques for flush airdata sensing systems

NASA Technical Reports Server (NTRS)

Whitmore, Stephen A.; Moes, Timothy R.; Leondes, Cornelius T.

1992-01-01

Methods based on chi-squared analysis are presented for detecting system and individual-port failures in the high-angle-of-attack flush airdata sensing system on the NASA F-18 High Alpha Research Vehicle. The HI-FADS hardware is introduced, and the aerodynamic model describes measured pressure in terms of dynamic pressure, angle of attack, angle of sideslip, and static pressure. Chi-squared analysis is described in the presentation of the concept for failure detection and fault management which includes nominal, iteration, and fault-management modes. A matrix of pressure orifices arranged in concentric circles on the nose of the aircraft indicate the parameters which are applied to the regression algorithms. The sensing techniques are applied to the F-18 flight data, and two examples are given of the computed angle-of-attack time histories. The failure-detection and fault-management techniques permit the matrix to be multiply redundant, and the chi-squared analysis is shown to be useful in the detection of failures.
Leadership styles of nurse managers in ethical dilemmas: Reasons and consequences.

PubMed

Zydziunaite, Vilma; Suominen, Tarja

2014-01-01

Abstract Background: Understanding the reasons and consequences of leadership styles in ethical dilemmas is fundamental to exploring nurse managers' abilities to influence outcomes for patients and nursing personnel. To explain the associations between different leadership styles, reasons for their application and its consequences when nurse managers make decisions in ethical dilemmas. The data were collected between 15 October 2011 and 30 April 2012 by statistically validated questionnaire. The respondents (N = 278) were nurse managers. The data were analysed using SPSS 20.0, calculating Spearman's correlations, the Stepwise Regression and ANOVA. The reasons for applying different leadership styles in ethical dilemmas include personal characteristics, years in work position, institutional factors, and the professional authority of nurse managers. The applied leadership styles in ethical dilemmas are associated with the consequences regarding the satisfaction of patients,' relatives' and nurse managers' needs. Nurse managers exhibited leadership styles oriented to maintenance, focussing more on the 'doing the job' than on managing the decision-making in ethical dilemmas.
Spatial regression analysis on 32 years of total column ozone data

NASA Astrophysics Data System (ADS)

Knibbe, J. S.; van der A, R. J.; de Laat, A. T. J.

2014-08-01

Multiple-regression analyses have been performed on 32 years of total ozone column data that was spatially gridded with a 1 × 1.5° resolution. The total ozone data consist of the MSR (Multi Sensor Reanalysis; 1979-2008) and 2 years of assimilated SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY) ozone data (2009-2010). The two-dimensionality in this data set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on nonseasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO), El Niño-Southern Oscillation (ENSO) and stratospheric alternative halogens which are parameterized by the effective equivalent stratospheric chlorine (EESC). For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of a similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at mid- and high latitudes, the solar cycle affects ozone positively mostly in the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high northern latitudes, the effect of QBO is positive and negative in the tropics and mid- to high latitudes, respectively, and ENSO affects ozone negatively between 30° N and 30° S, particularly over the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally large at mid- to high latitudes. We observe ozone increases with potential vorticity and day length and ozone decreases with geopotential height and variable ozone effects due to the polar vortex in regions to the north and south of the polar vortices. Recovery of ozone is identified globally. However, recovery rates and uncertainties strongly depend on choices that can be made in defining the explanatory variables. The application of several trend models, each with their own pros and cons, yields a large range of recovery rate estimates. Overall these results suggest that care has to be taken in determining ozone recovery rates, in particular for the Antarctic ozone hole.
Impacts of human-related practices on Ommatissus lybicus infestations of date palm in Oman.

PubMed

Al-Kindi, Khalifa M; Kwan, Paul; Andrew, Nigel R; Welch, Mitchell

2017-01-01

Date palm cultivation is economically important in the Sultanate of Oman, with significant financial investments coming from both the government and private individuals. However, a widespread Dubas bug (DB) (Ommatissus lybicus Bergevin) infestation has impacted regions including the Middle East, North Africa, Southeast Russia, and Spain, resulting in widespread damages to date palms. In this study, techniques in spatial statistics including ordinary least squares (OLS), geographically weighted regression (GRW), and exploratory regression (ER) were applied to (a) model the correlation between DB infestations and human-related practices that include irrigation methods, row spacing, palm tree density, and management of undercover and intercropped vegetation, and (b) predict the locations of future DB infestations in northern Oman. Firstly, we extracted row spacing and palm tree density information from remote sensed satellite images. Secondly, we collected data on irrigation practices and management by using a simple questionnaire, augmented with spatial data. Thirdly, we conducted our statistical analyses using all possible combinations of values over a given set of candidate variables using the chosen predictive modelling and regression techniques. Lastly, we identified the combination of human-related practices that are most conducive to the survival and spread of DB. Our results show that there was a strong correlation between DB infestations and several human-related practices parameters (R2 = 0.70). Variables including palm tree density, spacing between trees (less than 5 x 5 m), insecticide application, date palm and farm service (pruning, dethroning, remove weeds, and thinning), irrigation systems, offshoots removal, fertilisation and labour (non-educated) issues, were all found to significantly influence the degree of DB infestations. This study is expected to help reduce the extent and cost of aerial and ground sprayings, while facilitating the allocation of date palm plantations. An integrated pest management (IPM) system monitoring DB infestations, driven by GIS and remote sensed data collections and spatial statistical models, will allow for an effective DB management program in Oman. This will in turn ensure the competitiveness of Oman in the global date fruits market and help preserve national yields.
Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

PubMed

Chen, Baojiang; Qin, Jing

2014-05-10

In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

NASA Astrophysics Data System (ADS)

Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

2018-05-01

The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
Introduction to methodology of dose-response meta-analysis for binary outcome: With application on software.

PubMed

Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang

2018-05-01

Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Weak Convergence of Bounded Influence Regression Estimates with Applications.

DTIC Science & Technology

1980-04-01

for bounded influence regression M-estimates and apply the results to sequential clinical trials , withI special reference to repeated significance...AUTHOR(s) S. CONTRACT O&GRANT NUIMBER(s) ,’ Raymond J. Carrol avv David/uppert v .. AFOSR-80-0 9. PERFORMING ORGANIZATION NAME AND ADDRESS PRO AM I...THIS PAGE (*%ten [)at& Eke 7") 2 1. Introduction. Our primary concern is the comparison of two treatments in a clinical setting, although our results
Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

NASA Astrophysics Data System (ADS)

Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

2017-03-01

Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Highlighting differences between conditional and unconditional quantile regression approaches through an application to assess medication adherence.

PubMed

Borah, Bijan J; Basu, Anirban

2013-09-01

The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer's disease. Copyright © 2013 John Wiley & Sons, Ltd.
Logistic regression for dichotomized counts.

PubMed

Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

2016-12-01

Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Combustion performance and scale effect from N2O/HTPB hybrid rocket motor simulations

NASA Astrophysics Data System (ADS)

Shan, Fanli; Hou, Lingyun; Piao, Ying

2013-04-01

HRM code for the simulation of N2O/HTPB hybrid rocket motor operation and scale effect analysis has been developed. This code can be used to calculate motor thrust and distributions of physical properties inside the combustion chamber and nozzle during the operational phase by solving the unsteady Navier-Stokes equations using a corrected compressible difference scheme and a two-step, five species combustion model. A dynamic fuel surface regression technique and a two-step calculation method together with the gas-solid coupling are applied in the calculation of fuel regression and the determination of combustion chamber wall profile as fuel regresses. Both the calculated motor thrust from start-up to shut-down mode and the combustion chamber wall profile after motor operation are in good agreements with experimental data. The fuel regression rate equation and the relation between fuel regression rate and axial distance have been derived. Analysis of results suggests improvements in combustion performance to the current hybrid rocket motor design and explains scale effects in the variation of fuel regression rate with combustion chamber diameter.
Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality.

PubMed

Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B M J

2018-07-01

In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship. Copyright © 2018. Published by Elsevier B.V.
Using heart rate to predict energy expenditure in large domestic dogs.

PubMed

Gerth, N; Ruoß, C; Dobenecker, B; Reese, S; Starck, J M

2016-06-01

The aim of this study was to establish heart rate as a measure of energy expenditure in large active kennel dogs (28 ± 3 kg bw). Therefore, the heart rate (HR)-oxygen consumption (V˙O2) relationship was analysed in Foxhound-Boxer-Ingelheim-Labrador cross-breds (FBI dogs) at rest and graded levels of exercise on a treadmill up to 60-65% of maximal aerobic capacity. To test for effects of training, HR and V˙O2 were measured in female dogs, before and after a training period, and after an adjacent training pause to test for reversibility of potential effects. Least squares regression was applied to describe the relationship between HR and V˙O2. The applied training had no statistically significant effect on the HR-V˙O2 regression. A general regression line from all data collected was prepared to establish a general predictive equation for energy expenditure from HR in FBI dogs. The regression equation established in this study enables fast estimation of energy requirement for running activity. The equation is valid for large dogs weighing around 30 kg that run at ground level up to 15 km/h with a heart rate maximum of 190 bpm irrespective of the training level. Journal of Animal Physiology and Animal Nutrition © 2015 Blackwell Verlag GmbH.
On self-propagating methodological flaws in performance normalization for strength and power sports.

PubMed

Arandjelović, Ognjen

2013-06-01

Performance in strength and power sports is greatly affected by a variety of anthropometric factors. The goal of performance normalization is to factor out the effects of confounding factors and compute a canonical (normalized) performance measure from the observed absolute performance. Performance normalization is applied in the ranking of elite athletes, as well as in the early stages of youth talent selection. Consequently, it is crucial that the process is principled and fair. The corpus of previous work on this topic, which is significant, is uniform in the methodology adopted. Performance normalization is universally reduced to a regression task: the collected performance data are used to fit a regression function that is then used to scale future performances. The present article demonstrates that this approach is fundamentally flawed. It inherently creates a bias that unfairly penalizes athletes with certain allometric characteristics, and, by virtue of its adoption in the ranking and selection of elite athletes, propagates and strengthens this bias over time. The main flaws are shown to originate in the criteria for selecting the data used for regression, as well as in the manner in which the regression model is applied in normalization. This analysis brings into light the aforesaid methodological flaws and motivates further work on the development of principled methods, the foundations of which are also laid out in this work.
Independent contrasts and PGLS regression estimators are equivalent.

PubMed

Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

2012-05-01

We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
Penalized regression procedures for variable selection in the potential outcomes framework

PubMed Central

Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

2015-01-01

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Discrete post-processing of total cloud cover ensemble forecasts

NASA Astrophysics Data System (ADS)

Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian

2017-04-01

This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.

Computationally efficient algorithm for Gaussian Process regression in case of structured samples

NASA Astrophysics Data System (ADS)

Belyaev, M.; Burnaev, E.; Kapushev, Y.

2016-04-01

Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Efficient Regressions via Optimally Combining Quantile Information*

PubMed Central

Zhao, Zhibiao; Xiao, Zhijie

2014-01-01

We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481
Clustering performance comparison using K-means and expectation maximization algorithms.

PubMed

Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

2014-11-14

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Unified heat kernel regression for diffusion, kernel smoothing and wavelets on manifolds and its application to mandible growth modeling in CT images.

PubMed

Chung, Moo K; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K

2015-05-01

We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel method is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, the method is applied to characterize the localized growth pattern of mandible surfaces obtained in CT images between ages 0 and 20 by regressing the length of displacement vectors with respect to a surface template. Copyright © 2015 Elsevier B.V. All rights reserved.
Effective Surfactants Blend Concentration Determination for O/W Emulsion Stabilization by Two Nonionic Surfactants by Simple Linear Regression.

PubMed

Hassan, A K

2015-01-01

In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.
Economic Insights into Providing Access to Improved Groundwater Sources in Remote, Low-Resource Areas

NASA Astrophysics Data System (ADS)

Abramson, A.; Lazarovitch, N.; Adar, E.

2013-12-01

Groundwater is often the most or only feasible drinking water source in remote, low-resource areas. Yet the economics of its development have not been systematically outlined. We applied CBARWI (Cost-Benefit Analysis for Remote Water Improvements), a recently developed Decision Support System, to investigate the economic, physical and management factors related to the costs and benefits of non-networked groundwater supply in remote areas. Synthetic profiles of community water services (n = 17,962), defined across 14 parameters' values and ranges relevant to remote areas, were imputed into the decision framework, and the parameter effects on economic outcomes were investigated through regression analysis (Table 1). Several approaches were included for financing the improvements, after Abramson et al, 2011: willingness-to -pay (WTP), -borrow (WTB) and -work (WTW) in community irrigation (';water-for-work'). We found that low-cost groundwater development approaches are almost 7 times more cost-effective than conventional boreholes fitted with handpumps. The costs of electric, submersible borehole pumps are comparable only when providing expanded water supplies, and off-grid communities pay significantly more for such expansions. In our model, new source construction is less cost-effective than improvement of existing wells, but necessary for expanding access to isolated households. The financing approach significantly impacts the feasibility of demand-driven cost recovery; in our investigation, benefit exceeds cost in 16, 32 and 48% of water service configurations financed by WTP, WTB and WTW, respectively. Regressions of total cost (R2 = 0.723) and net benefit under WTW (R2 = 0.829) along with analysis of output distributions indicate that parameters determining the profitability of irrigation are different from those determining costs and other measures of net benefit. These findings suggest that the cost-benefit outcomes associated with groundwater-based water supply improvements vary considerably by many parameters. Thus, a wide variety of factors should be included to inform water development strategies. Abramson, A. et al (2011), Willingness to pay, borrow and work for water service improvements in developing countries, Water Resour Res, 47Table 1: Descriptions, investigated values and regression coefficients of parameters included in our analysis. Rank of standardized β indicates relative importance. Regression dependent variables are in [($ household-1) y-1]. * Parameters relevant to water-for-work program only.† p <.0001‡ p <.05
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks

NASA Astrophysics Data System (ADS)

Kanevski, Mikhail

2015-04-01

The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
Can attachment and peer relation constructs predict anxiety in ethnic minority youths? A longitudinal exploratory study.

PubMed

Esbjørn, Barbara Hoff; Breinholst, Sonja; Kriss, Alexander; Hald, Helle Hindhede; Steele, Howard

2015-01-01

Anxiety is the most prevalent psychiatric disturbance in childhood effecting typically 15-20% of all youth. It has been associated with attachment insecurity and reduced competence in peer relations. Prior work has been limited by including mainly White samples, relying on questionnaires, and applying a cross-sectional design. The present study addressed these limitations by considering how at-risk non-White youth (n = 34) responded to the Friends and Family Interview (FFI) in middle childhood and how this linked up with anxiety symptoms and an anxiety diagnosis three years later in early adolescence. Five dimensions of secure attachment, namely, (i) to mother, (ii) to father, (iii) coherence, (iv) developmental understanding, and (v) social competence and quality of contact with best friend in middle childhood, were found to correlate significantly (and negatively) with self-reported anxiety symptoms. Linear regression results showed independent influences of female gender, and (low) quality of best friend contact as the most efficient model predicting anxiety symptoms. Logistic regression results suggested a model that included female gender, low social competence, and immature developmental understanding as efficient predictors of an anxiety diagnosis, evident in only 18% of the sample. These results point to the usefulness of after-school programs for at-risk minority youth in promoting peer competence, developmental awareness, and minimizing anxiety difficulties.
Outcome and prognostic factors in critically ill patients with systemic lupus erythematosus: a retrospective study.

PubMed

Hsu, Chia-Lin; Chen, Kuan-Yu; Yeh, Pu-Sheng; Hsu, Yeong-Long; Chang, Hou-Tai; Shau, Wen-Yi; Yu, Chia-Li; Yang, Pan-Chyr

2005-06-01

Systemic lupus erythematosus (SLE) is an archetypal autoimmune disease, involving multiple organ systems with varying course and prognosis. However, there is a paucity of clinical data regarding prognostic factors in SLE patients admitted to the intensive care unit (ICU). From January 1992 to December 2000, all patients admitted to the ICU with a diagnosis of SLE were included. Patients were excluded if the diagnosis of SLE was established at or after ICU admission. A multivariate logistic regression model was applied using Acute Physiology and Chronic Health Evaluation II scores and variables that were at least moderately associated (P < 0.2) with survival in the univariate analysis. A total of 51 patients meeting the criteria were included. The mortality rate was 47%. The most common cause of admission was pneumonia with acute respiratory distress syndrome. Multivariate logistic regression analysis showed that intracranial haemorrhage occurring while the patient was in the ICU (relative risk = 18.68), complicating gastrointestinal bleeding (relative risk = 6.97) and concurrent septic shock (relative risk = 77.06) were associated with greater risk of dying, whereas causes of ICU admission and Acute Physiology and Chronic Health Evaluation II score were not significantly associated with death. The mortality rate in critically ill SLE patients was high. Gastrointestinal bleeding, intracranial haemorrhage and septic shock were significant prognostic factors in SLE patients admitted to the ICU.
Neural network modeling for surgical decisions on traumatic brain injury patients.

PubMed

Li, Y C; Liu, L; Chiu, W T; Jian, W S

2000-01-01

Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

PubMed

Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

2017-09-01

To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

PubMed

Delwiche, Stephen R; Reeves, James B

2010-01-01

In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various types of spectroscopy data.
Analysis of Binary Adherence Data in the Setting of Polypharmacy: A Comparison of Different Approaches

PubMed Central

Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.

2009-01-01

Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
[Local Regression Algorithm Based on Net Analyte Signal and Its Application in Near Infrared Spectral Analysis].

PubMed

Zhang, Hong-guang; Lu, Jian-gang

2016-02-01

Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.

PubMed

Liu, Han; Wang, Lie; Zhao, Tuo

2015-08-01

We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.
Learning to Predict Combinatorial Structures

NASA Astrophysics Data System (ADS)

Vembu, Shankar

2009-12-01

The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

NASA Astrophysics Data System (ADS)

Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

2018-02-01

For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.
Discrimination of honeys using colorimetric sensor arrays, sensory analysis and gas chromatography techniques.

PubMed

Tahir, Haroon Elrasheid; Xiaobo, Zou; Xiaowei, Huang; Jiyong, Shi; Mariod, Abdalbasit Adam

2016-09-01

Aroma profiles of six honey varieties of different botanical origins were investigated using colorimetric sensor array, gas chromatography-mass spectrometry (GC-MS) and descriptive sensory analysis. Fifty-eight aroma compounds were identified, including 2 norisoprenoids, 5 hydrocarbons, 4 terpenes, 6 phenols, 7 ketones, 9 acids, 12 aldehydes and 13 alcohols. Twenty abundant or active compounds were chosen as key compounds to characterize honey aroma. Discrimination of the honeys was subsequently implemented using multivariate analysis, including hierarchical clustering analysis (HCA) and principal component analysis (PCA). Honeys of the same botanical origin were grouped together in the PCA score plot and HCA dendrogram. SPME-GC/MS and colorimetric sensor array were able to discriminate the honeys effectively with the advantages of being rapid, simple and low-cost. Moreover, partial least squares regression (PLSR) was applied to indicate the relationship between sensory descriptors and aroma compounds. Copyright © 2016 Elsevier Ltd. All rights reserved.
In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

PubMed Central

Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

2018-01-01

During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future. PMID:29515993
Londrina Activities of Daily Living Protocol: Reproducibility, Validity, and Reference Values in Physically Independent Adults Age 50 Years and Older.

PubMed

Paes, Thaís; Belo, Letícia Fernandes; da Silva, Diego Rodrigues; Morita, Andrea Akemi; Donária, Leila; Furlanetto, Karina Couto; Sant'Anna, Thaís; Pitta, Fabio; Hernandes, Nidia Aparecida

2017-03-01

It is important to assess activities of daily living (ADL) in older adults due to impairment of independence and quality of life. However, there is no objective and standardized protocol available to assess this outcome. Thus, the aim of this study was to verify the reproducibility and validity of a new protocol for ADL assessment applied in physically independent adults age ≥50 y, the Londrina ADL protocol, and to establish an equation to predict reference values of the Londrina ADL protocol. Ninety-three physically independent adults age ≥50 y had their performance in ADL evaluated by registering the time spent to conclude the protocol. The protocol was performed twice. The 6-min walk test, which assesses functional exercise capacity, was used as a validation criterion. A multiple linear regression model was applied, including anthropometric and demographic variables that correlated with the protocol, to establish an equation to predict the protocol's reference values. In general, the protocol was reproducible (intraclass correlation coefficient 0.91). The average difference between the first and second protocol was 5.3%. The new protocol was valid to assess ADL performance in the studied subjects, presenting a moderate correlation with the 6-min walk test (r = -0.53). The time spent to perform the protocol correlated significantly with age (r = 0.45) but neither with weight (r = -0.17) nor with height (r = -0.17). A model of stepwise multiple regression including sex and age showed that age was the only determinant factor to the Londrina ADL protocol, explaining 21% ( P < .001) of its variability. The derived reference equation was: Londrina ADL protocol pred (s) = 135.618 + (3.102 × age [y]). The Londrina ADL protocol was reproducible and valid in physically independent adults age ≥50 y. A reference equation for the protocol was established including only age as an independent variable (r 2 = 0.21), allowing a better interpretation of the protocol's results in clinical practice. Copyright © 2017 by Daedalus Enterprises.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.