A Two-Step Penalized Regression Method with Networked Predictors
Luo, Chong; Pan, Wei; Shen, Xiaotong
2011-01-01
Penalized regression incorporating prior dependency structure of predictors can be effective in high-dimensional data analysis (Li and Li 2008). Pan, Xie and Shen (2010) proposed a penalized regression method for better outcome prediction and variable selection by smoothing parameters over a given predictor network, which can be applied to analysis of microarray data with a given gene network. In this paper, we develop two modifications to their method for further performance enhancement. First, we employ convex programming and show its improved performance over an approximate optimization algorithm implemented in their original proposal. Second, we perform bias reduction after initial variable selection through a new penalty, leading to better parameter estimates and outcome prediction. Simulations have demonstrated substantial performance improvement of the proposed modifications over the original method. PMID:23795219
Liu, Jin; Wang, Kai; Ma, Shuangge; Huang, Jian
2013-01-01
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with LASSO and MCP approaches are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using heterogeneous stock mice data and a rheumatoid arthritis.
Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Yi, Hui; Breheny, Patrick; Imam, Netsanet; Liu, Yongmei; Hoeschele, Ina
2015-01-01
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini–Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method. PMID:25354699
Yi, Hui; Breheny, Patrick; Imam, Netsanet; Liu, Yongmei; Hoeschele, Ina
2015-01-01
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.
Marginal longitudinal semiparametric regression via penalized splines
Kadiri, M. Al; Carroll, R.J.; Wand, M.P.
2010-01-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models. PMID:21037941
Penalized variable selection in competing risks regression.
Fu, Zhixuan; Parikh, Chirag R; Zhou, Bingqing
2017-07-01
Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496-509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.
Penalized Bregman divergence for large-dimensional regression and classification
Zhang, Chunming; Jiang, Yuan; Chai, Yi
2010-01-01
Summary Regularization methods are characterized by loss functions measuring data fits and penalty terms constraining model parameters. The commonly used quadratic loss is not suitable for classification with binary responses, whereas the loglikelihood function is not readily applicable to models where the exact distribution of observations is unknown or not fully specified. We introduce the penalized Bregman divergence by replacing the negative loglikelihood in the conventional penalized likelihood with Bregman divergence, which encompasses many commonly used loss functions in the regression analysis, classification procedures and machine learning literature. We investigate new statistical properties of the resulting class of estimators with the number pn of parameters either diverging with the sample size n or even nearly comparable with n, and develop statistical inference tools. It is shown that the resulting penalized estimator, combined with appropriate penalties, achieves the same oracle property as the penalized likelihood estimator, but asymptotically does not rely on the complete specification of the underlying distribution. Furthermore, the choice of loss function in the penalized classifiers has an asymptotically relatively negligible impact on classification performance. We illustrate the proposed method for quasilikelihood regression and binary classification with simulation evaluation and real-data application. PMID:22822248
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Survival Analysis by Penalized Regression and Matrix Factorization
Lai, Yeuntyng; Hayashida, Morihiro; Akutsu, Tatsuya
2013-01-01
Because every disease has its unique survival pattern, it is necessary to find a suitable model to simulate followups. DNA microarray is a useful technique to detect thousands of gene expressions at one time and is usually employed to classify different types of cancer. We propose combination methods of penalized regression models and nonnegative matrix factorization (NMF) for predicting survival. We tried L 1- (lasso), L 2- (ridge), and L 1-L 2 combined (elastic net) penalized regression for diffuse large B-cell lymphoma (DLBCL) patients' microarray data and found that L 1-L 2 combined method predicts survival best with the smallest logrank P value. Furthermore, 80% of selected genes have been reported to correlate with carcinogenesis or lymphoma. Through NMF we found that DLBCL patients can be divided into 4 groups clearly, and it implies that DLBCL may have 4 subtypes which have a little different survival patterns. Next we excluded some patients who were indicated hard to classify in NMF and executed three penalized regression models again. We found that the performance of survival prediction has been improved with lower logrank P values. Therefore, we conclude that after preselection of patients by NMF, penalized regression models can predict DLBCL patients' survival successfully. PMID:23737722
Reduced rank regression via adaptive nuclear norm penalization
Chen, Kun; Dong, Hongbo; Chan, Kung-Sik
2014-01-01
Summary We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy. PMID:25045172
Penalized spline estimation for functional coefficient regression models.
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan
2010-04-01
The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Rendall, Ricardo; Pereira, Ana Cristina; Reis, Marco S
2017-08-15
In this paper we test and compare advanced predictive approaches for estimating wine age in the context of the production of a high quality fortified wine - Madeira Wine. We consider four different data sets, namely, volatile, polyphenols, organic acids and the UV-vis spectra. Each one of these data sets contain chemical information of a different nature and present diverse data structures, namely a different dimensionality, level of collinearity and degree of sparsity. These different aspects may imply the use of different modelling approaches in order to better explore the data set's information content, namely their predictive potential for wine age. This happens to be so, because different regression methods have different prior assumptions regarding the predictors, response variable(s) and the data generating mechanism, which may or may not find good adherence to the case study under analysis. In order to cover a wide range of modelling domains, we have incorporated in this work methods belonging to four very distinct classes of approaches that cover most applications found in practice: linear regression with variable selection, penalized regression, latent variables regression and tree-based ensemble methods. We have also developed a rigorous comparison framework based on a double Monte Carlo cross-validation scheme, in order to perform the relative assessment of the performance of the various methods. Upon comparison, models built using the polyphenols and volatile composition data sets led to better wine age predictions, showing lower errors under testing conditions. Furthermore, the results obtained for the polyphenols data set suggest a more sparse structure that can be further explored in order to reduce the number of measured variables. In terms of regression methods, tree-based methods, and boosted regression trees in particular, presented the best results for the polyphenols, volatile and the organic acid data sets, suggesting a possible presence of a
Gene set selection via LASSO penalized regression (SLPR).
Frost, H Robert; Amos, Christopher I
2017-07-07
Gene set testing is an important bioinformatics technique that addresses the challenges of power, interpretation and replication. To better support the analysis of large and highly overlapping gene set collections, researchers have recently developed a number of multiset methods that jointly evaluate all gene sets in a collection to identify a parsimonious group of functionally independent sets. Unfortunately, current multiset methods all use binary indicators for gene and gene set activity and assume that a gene is active if any containing gene set is active. This simplistic model limits performance on many types of genomic data. To address this limitation, we developed gene set Selection via LASSO Penalized Regression (SLPR), a novel mapping of multiset gene set testing to penalized multiple linear regression. The SLPR method assumes a linear relationship between continuous measures of gene activity and the activity of all gene sets in the collection. As we demonstrate via simulation studies and the analysis of TCGA data using MSigDB gene sets, the SLPR method outperforms existing multiset methods when the true biological process is well approximated by continuous activity measures and a linear association between genes and gene sets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Compound Identification Using Penalized Linear Regression on Metabolomics.
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2016-05-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson's correlation along with the penalized linear regression are proposed in this study.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Incorporating scientific knowledge into phenotype development: penalized latent class regression.
Leoutsakos, Jeannie-Marie S; Bandeen-Roche, Karen; Garrett-Mayer, Elizabeth; Zandi, Peter P
2011-03-30
The field of psychiatric genetics is hampered by the lack of a clear taxonomy for disorders. Building on the work of Houseman and colleagues (Feature-specific penalized latent class analysis for genomic data. Harvard University Biostatistics Working Paper Series, Working Paper 22, 2005), we describe a penalized latent class regression aimed at allowing additional scientific information to influence the estimation of the measurement model, while retaining the standard assumption of non-differential measurement. In simulation studies, ridge and LASSO penalty functions improved the precision of estimates and, in some cases of differential measurement, also reduced bias. Class-specific penalization enhanced separation of latent classes with respect to covariates, but only in scenarios where there was a true separation. Penalization proved to be less computationally intensive than an analogous Bayesian analysis by a factor of 37. This methodology was then applied to data from normal elderly subjects from the Cache County Study on Memory and Aging. Addition of APO-E genotype and a number of baseline clinical covariates improved the dementia prediction utility of the latent classes; application of class-specific penalization improved precision while retaining that prediction utility. This methodology may be useful in scenarios with large numbers of collinear covariates or in certain cases where latent class model assumptions are violated. Investigation of novel penalty functions may prove fruitful in further refining psychiatric phenotypes. Copyright © 2010 John Wiley & Sons, Ltd.
Polygenic scores via penalized regression on summary statistics.
Mak, Timothy Shin Heng; Porsch, Robert Milan; Choi, Shing Wan; Zhou, Xueya; Sham, Pak Chung
2017-09-01
Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred. © 2017 WILEY PERIODICALS, INC.
Iterative Brinkman penalization for remeshed vortex methods
NASA Astrophysics Data System (ADS)
Hejlesen, Mads Mølholm; Koumoutsakos, Petros; Leonard, Anthony; Walther, Jens Honoré
2015-01-01
We introduce an iterative Brinkman penalization method for the enforcement of the no-slip boundary condition in remeshed vortex methods. In the proposed method, the Brinkman penalization is applied iteratively only in the neighborhood of the body. This allows for using significantly larger time steps, than what is customary in the Brinkman penalization, thus reducing its computational cost while maintaining the capability of the method to handle complex geometries. We demonstrate the accuracy of our method by considering challenging benchmark problems such as flow past an impulsively started cylinder and normal to an impulsively started and accelerated flat plate. We find that the present method enhances significantly the accuracy of the Brinkman penalization technique for the simulations of highly unsteady flows past complex geometries.
Penalized regression procedures for variable selection in the potential outcomes framework
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.
2015-01-01
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Guo, Pi; Zhang, Jianjun; Wang, Li; Yang, Shaoyi; Luo, Ganfeng; Deng, Changyu; Wen, Ye; Zhang, Qingying
2017-01-01
Seasonal influenza epidemics cause serious public health problems in China. Search queries-based surveillance was recently proposed to complement traditional monitoring approaches of influenza epidemics. However, developing robust techniques of search query selection and enhancing predictability for influenza epidemics remains a challenge. This study aimed to develop a novel ensemble framework to improve penalized regression models for detecting influenza epidemics by using Baidu search engine query data from China. The ensemble framework applied a combination of bootstrap aggregating (bagging) and rank aggregation method to optimize penalized regression models. Different algorithms including lasso, ridge, elastic net and the algorithms in the proposed ensemble framework were compared by using Baidu search engine queries. Most of the selected search terms captured the peaks and troughs of the time series curves of influenza cases. The predictability of the conventional penalized regression models were improved by the proposed ensemble framework. The elastic net regression model outperformed the compared models, with the minimum prediction errors. We established a Baidu search engine queries-based surveillance model for monitoring influenza epidemics, and the proposed model provides a useful tool to support the public health response to influenza and other infectious diseases. PMID:28422149
Greenland, Sander; Mansournia, Mohammad Ali
2015-10-15
Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t-distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log-F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log-F priors is trivial to implement and facilitates mean-squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies. Copyright © 2015 John Wiley & Sons, Ltd.
An Iterative Brinkman penalization for particle vortex methods
NASA Astrophysics Data System (ADS)
Walther, J. H.; Hejlesen, M. M.; Leonard, A.; Koumoutsakos, P.
2013-11-01
We present an iterative Brinkman penalization method for the enforcement of the no-slip boundary condition in vortex particle methods. This is achieved by implementing a penalization of the velocity field using iteration of the penalized vorticity. We show that using the conventional Brinkman penalization method can result in an insufficient enforcement of solid boundaries. The specific problems of the conventional penalization method is discussed and three examples are presented by which the method in its current form has shown to be insufficient to consistently enforce the no-slip boundary condition. These are: the impulsively started flow past a cylinder, the impulsively started flow normal to a flat plate, and the uniformly accelerated flow normal to a flat plate. The iterative penalization algorithm is shown to give significantly improved results compared to the conventional penalization method for each of the presented flow cases.
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS).
Lee, Wonyul; Liu, Yufeng
2012-10-01
Multivariate regression is a common statistical tool for practical problems. Many multivariate regression techniques are designed for univariate response cases. For problems with multiple response variables available, one common approach is to apply the univariate response regression technique separately on each response variable. Although it is simple and popular, the univariate response approach ignores the joint information among response variables. In this paper, we propose three new methods for utilizing joint information among response variables. All methods are in a penalized likelihood framework with weighted L(1) regularization. The proposed methods provide sparse estimators of conditional inverse co-variance matrix of response vector given explanatory variables as well as sparse estimators of regression parameters. Our first approach is to estimate the regression coefficients with plug-in estimated inverse covariance matrices, and our second approach is to estimate the inverse covariance matrix with plug-in estimated regression parameters. Our third approach is to estimate both simultaneously. Asymptotic properties of these methods are explored. Our numerical examples demonstrate that the proposed methods perform competitively in terms of prediction, variable selection, as well as inverse covariance matrix estimation.
Göbl, Christian S; Bozkurt, Latife; Tura, Andrea; Pacini, Giovanni; Kautzky-Willer, Alexandra; Mittlböck, Martina
2015-01-01
This paper aims to introduce penalized estimation techniques in clinical investigations of diabetes, as well as to assess their possible advantages and limitations. Data from a previous study was used to carry out the simulations to assess: a) which procedure results in the lowest prediction error of the final model in the setting of a large number of predictor variables with high multicollinearity (of importance if insulin sensitivity should be predicted) and b) which procedure achieves the most accurate estimate of regression coefficients in the setting of fewer predictors with small unidirectional effects and moderate correlation between explanatory variables (of importance if the specific relation between an independent variable and insulin sensitivity should be examined). Moreover a special focus is on the correct direction of estimated parameter effects, a non-negligible source of error and misinterpretation of study results. The simulations were performed for varying sample size to evaluate the performance of LASSO, Ridge as well as different algorithms for Elastic Net. These methods were also compared with automatic variable selection procedures (i.e. optimizing AIC or BIC).We were not able to identify one method achieving superior performance in all situations. However, the improved accuracy of estimated effects underlines the importance of using penalized regression techniques in our example (e.g. if a researcher aims to compare relations of several correlated parameters with insulin sensitivity). However, the decision which procedure should be used depends on the specific context of a study (accuracy versus complexity) and moreover should involve clinical prior knowledge.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.; ...
2017-09-22
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
Göbl, Christian S.; Bozkurt, Latife; Tura, Andrea; Pacini, Giovanni; Kautzky-Willer, Alexandra; Mittlböck, Martina
2015-01-01
This paper aims to introduce penalized estimation techniques in clinical investigations of diabetes, as well as to assess their possible advantages and limitations. Data from a previous study was used to carry out the simulations to assess: a) which procedure results in the lowest prediction error of the final model in the setting of a large number of predictor variables with high multicollinearity (of importance if insulin sensitivity should be predicted) and b) which procedure achieves the most accurate estimate of regression coefficients in the setting of fewer predictors with small unidirectional effects and moderate correlation between explanatory variables (of importance if the specific relation between an independent variable and insulin sensitivity should be examined). Moreover a special focus is on the correct direction of estimated parameter effects, a non-negligible source of error and misinterpretation of study results. The simulations were performed for varying sample size to evaluate the performance of LASSO, Ridge as well as different algorithms for Elastic Net. These methods were also compared with automatic variable selection procedures (i.e. optimizing AIC or BIC).We were not able to identify one method achieving superior performance in all situations. However, the improved accuracy of estimated effects underlines the importance of using penalized regression techniques in our example (e.g. if a researcher aims to compare relations of several correlated parameters with insulin sensitivity). However, the decision which procedure should be used depends on the specific context of a study (accuracy versus complexity) and moreover should involve clinical prior knowledge. PMID:26544569
biospear: an R package for biomarker selection in penalized Cox regression.
Ternès, Nils; Rotolo, Federico; Michiels, Stefan
2017-09-12
The R package biospear allows selecting the biomarkers with the strongest impact on survival and on the treatment effect in high-dimensional Cox models, and estimating expected survival probabilities. Most of the implemented approaches are based on penalized regression techniques. The package is available on the CRAN. ( https://CRAN.R-project.org/package=biospear ). stefan.michiels@gustaveroussy.fr.
Gene and pathway identification with Lp penalized Bayesian logistic regression
Liu, Zhenqiu; Gartenhaus, Ronald B; Tan, Ming; Jiang, Feng; Jiao, Xiaoli
2008-01-01
Background Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming. Results In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID . Conclusion Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions. PMID:18834526
Incorporating Predictor Network in Penalized Regression with Application to Microarray Data
Pan, Wei; Xie, Benhuai; Shen, Xiaotong
2012-01-01
Summary We consider penalized linear regression, especially for “large p, small n” problems, for which the relationships among predictors are described a priori by a network. A class of motivating examples includes modeling a phenotype through gene expression profiles while accounting for coordinated functioning of genes in the form of biological pathways or networks. To incorporate the prior knowledge of the similar effect sizes of neighboring predictors in a network, we propose a grouped penalty based on the Lγ-norm that smoothes the regression coefficients of the predictors over the network. The main feature of the proposed method is its ability to automatically realize grouped variable selection and exploit grouping effects. We also discuss effects of the choices of the γ and some weights inside the Lγ-norm. Simulation studies demonstrate the superior finite sample performance of the proposed method as compared to Lasso, elastic net and a recently proposed network-based method. The new method performs best in variable selection across all simulation set-ups considered. For illustration, the method is applied to a microarray dataset to predict survival times for some glioblastoma patients using a gene expression dataset and a gene network compiled from some KEGG pathways. PMID:19645699
Janssen, Kristel J M; Siccama, Ivar; Vergouwe, Yvonne; Koffijberg, Hendrik; Debray, T P A; Keijzer, Maarten; Grobbee, Diederick E; Moons, Karel G M
2012-04-01
Many prediction models are developed by multivariable logistic regression. However, there are several alternative methods to develop prediction models. We compared the accuracy of a model that predicts the presence of deep venous thrombosis (DVT) when developed by four different methods. We used the data of 2,086 primary care patients suspected of DVT, which included 21 candidate predictors. The cohort was split into a derivation set (1,668 patients, 329 with DVT) and a validation set (418 patients, 86 with DVT). Also, 100 cross-validations were conducted in the full cohort. The models were developed by logistic regression, logistic regression with shrinkage by bootstrapping techniques, logistic regression with shrinkage by penalized maximum likelihood estimation, and genetic programming. The accuracy of the models was tested by assessing discrimination and calibration. There were only marginal differences in the discrimination and calibration of the models in the validation set and cross-validations. The accuracy measures of the models developed by the four different methods were only slightly different, and the 95% confidence intervals were mostly overlapped. We have shown that models with good predictive accuracy are most likely developed by sensible modeling strategies rather than by complex development methods. Copyright Â© 2012 Elsevier Inc. All rights reserved.
A new method for robust mixture regression
YU, Chun; YAO, Weixin; CHEN, Kun
2017-01-01
Finite mixture regression models have been widely used for modelling mixed regression relationships arising from a clustered and thus heterogenous population. The classical normal mixture model, despite its simplicity and wide applicability, may fail in the presence of severe outliers. Using a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization, we propose a robust mixture regression approach for simultaneously conducting outlier detection and robust parameter estimation. A penalized likelihood approach is adopted to induce sparsity among the mean-shift parameters so that the outliers are distinguished from the remainder of the data, and a generalized Expectation-Maximization (EM) algorithm is developed to perform stable and efficient computation. The proposed approach is shown to have strong connections with other robust methods including the trimmed likelihood method and M-estimation approaches. In contrast to several existing methods, the proposed methods show outstanding performance in our simulation studies. PMID:28579672
A new method for robust mixture regression.
Yu, Chun; Yao, Weixin; Chen, Kun
2017-03-01
Finite mixture regression models have been widely used for modelling mixed regression relationships arising from a clustered and thus heterogenous population. The classical normal mixture model, despite its simplicity and wide applicability, may fail in the presence of severe outliers. Using a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization, we propose a robust mixture regression approach for simultaneously conducting outlier detection and robust parameter estimation. A penalized likelihood approach is adopted to induce sparsity among the mean-shift parameters so that the outliers are distinguished from the remainder of the data, and a generalized Expectation-Maximization (EM) algorithm is developed to perform stable and efficient computation. The proposed approach is shown to have strong connections with other robust methods including the trimmed likelihood method and M-estimation approaches. In contrast to several existing methods, the proposed methods show outstanding performance in our simulation studies.
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION
Wang, Lan; Kim, Yongdai; Li, Runze
2014-01-01
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis. PMID:24948843
Bergen, Silas; Sheppard, Lianne; Kaufman, Joel D; Szpiro, Adam A
2016-11-01
Air pollution epidemiology studies are trending towards a multi-pollutant approach. In these studies, exposures at subject locations are unobserved and must be predicted using observed exposures at misaligned monitoring locations. This induces measurement error, which can bias the estimated health effects and affect standard error estimates. We characterize this measurement error and develop an analytic bias correction when using penalized regression splines to predict exposure. Our simulations show bias from multi-pollutant measurement error can be severe, and in opposite directions or simultaneously positive or negative. Our analytic bias correction combined with a non-parametric bootstrap yields accurate coverage of 95% confidence intervals. We apply our methodology to analyze the association of systolic blood pressure with PM2.5 and NO2 in the NIEHS Sister Study. We find that NO2 confounds the association of systolic blood pressure with PM2.5 and vice versa. Elevated systolic blood pressure was significantly associated with increased PM2.5 and decreased NO2. Correcting for measurement error bias strengthened these associations and widened 95% confidence intervals.
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Wang, Lan; Kim, Yongdai; Li, Runze
2013-10-01
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.
Chaibub Neto, Elias; Bare, J Christopher; Margolin, Adam A
2014-01-01
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where "omics" features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.
Chaibub Neto, Elias; Bare, J. Christopher; Margolin, Adam A.
2014-01-01
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights. PMID:25289666
Jiang, Xiaoyu; Fuchs, Mathias
2017-01-01
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. PMID:28546826
NASA Astrophysics Data System (ADS)
Brown-Dymkoski, Eric; Kasimov, Nurlybek; Vasilyev, Oleg V.
2014-04-01
In order to introduce solid obstacles into flows, several different methods are used, including volume penalization methods which prescribe appropriate boundary conditions by applying local forcing to the constitutive equations. One well known method is Brinkman penalization, which models solid obstacles as porous media. While it has been adapted for compressible, incompressible, viscous and inviscid flows, it is limited in the types of boundary conditions that it imposes, as are most volume penalization methods. Typically, approaches are limited to Dirichlet boundary conditions. In this paper, Brinkman penalization is extended for generalized Neumann and Robin boundary conditions by introducing hyperbolic penalization terms with characteristics pointing inward on solid obstacles. This Characteristic-Based Volume Penalization (CBVP) method is a comprehensive approach to conditions on immersed boundaries, providing for homogeneous and inhomogeneous Dirichlet, Neumann, and Robin boundary conditions on hyperbolic and parabolic equations. This CBVP method can be used to impose boundary conditions for both integrated and non-integrated variables in a systematic manner that parallels the prescription of exact boundary conditions. Furthermore, the method does not depend upon a physical model, as with porous media approach for Brinkman penalization, and is therefore flexible for various physical regimes and general evolutionary equations. Here, the method is applied to scalar diffusion and to direct numerical simulation of compressible, viscous flows. With the Navier-Stokes equations, both homogeneous and inhomogeneous Neumann boundary conditions are demonstrated through external flow around an adiabatic and heated cylinder. Theoretical and numerical examination shows that the error from penalized Neumann and Robin boundary conditions can be rigorously controlled through an a priori penalization parameter η. The error on a transient boundary is found to converge as O
CHENG, GUANG; KOSOROK, MICHAEL R.
2010-01-01
The penalized profile sampler for semiparametric inference is an extension of the profile sampler method [9] obtained by profiling a penalized log-likelihood. The idea is to base inference on the posterior distribution obtained by multiplying a profiled penalized log-likelihood by a prior for the parametric component, where the profiling and penalization are applied to the nuisance parameter. Because the prior is not applied to the full likelihood, the method is not strictly Bayesian. A benefit of this approximately Bayesian method is that it circumvents the need to put a prior on the possibly infinite-dimensional nuisance components of the model. We investigate the first and second order frequentist performance of the penalized profile sampler, and demonstrate that the accuracy of the procedure can be adjusted by the size of the assigned smoothing parameter. The theoretical validity of the procedure is illustrated for two examples: a partly linear model with normal error for current status data and a semiparametric logistic regression model. Simulation studies are used to verify the theoretical results. PMID:20431712
Gui, Jiang; Li, Hongzhe
2005-07-01
An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. We propose to use the L(1) penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L(2) penalized regression and a few other dimension-reduction based methods. We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2009-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape and topology optimization of fluid flows. Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid in complex geometries (where grid resolution varies both in space and time time), while Brinkman volume penalization allows easy variation of flow geometry without using body-fitted meshes by simply changing the shape of the penalization region. The use of Brinkman volume penalization approach allow seamless transition from shape to topology optimization by combining it with level set approach and increasing the size of the optimization space. The approach is demonstrated for shape optimization of a variety of fluid flows by optimizing single cost function (time averaged Drag coefficient) using covariance matrix adaptation (CMA) evolutionary algorithm.
Astronomical Methods for Nonparametric Regression
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Zhang, Zhenyue; Zha, Hongyuan; Simon, Horst
2006-07-31
In this paper, we developed numerical algorithms for computing sparse low-rank approximations of matrices, and we also provided a detailed error analysis of the proposed algorithms together with some numerical experiments. The low-rank approximations are constructed in a certain factored form with the degree of sparsity of the factors controlled by some user-specified parameters. In this paper, we cast the sparse low-rank approximation problem in the framework of penalized optimization problems. We discuss various approximation schemes for the penalized optimization problem which are more amenable to numerical computations. We also include some analysis to show the relations between the original optimization problem and the reduced one. We then develop a globally convergent discrete Newton-like iterative method for solving the approximate penalized optimization problems. We also compare the reconstruction errors of the sparse low-rank approximations computed by our new methods with those obtained using the methods in the earlier paper and several other existing methods for computing sparse low-rank approximations. Numerical examples show that the penalized methods are more robust and produce approximations with factors which have fewer columns and are sparser.
Synthesizing Regression Results: A Factored Likelihood Method
ERIC Educational Resources Information Center
Wu, Meng-Jia; Becker, Betsy Jane
2013-01-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…
Synthesizing Regression Results: A Factored Likelihood Method
ERIC Educational Resources Information Center
Wu, Meng-Jia; Becker, Betsy Jane
2013-01-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…
NASA Astrophysics Data System (ADS)
Gillis, T.; Winckelmans, G.; Chatelain, P.
2017-10-01
We formulate the penalization problem inside a vortex particle-mesh method as a linear system. This system has to be solved at every wall boundary condition enforcement within a time step. Furthermore, because the underlying problem is a Poisson problem, the solution of this linear system is computationally expensive. For its solution, we here use a recycling iterative solver, rBiCGStab, in order to reduce the number of iterations and therefore decrease the computational cost of the penalization step. For the recycled subspace, we use the orthonormalized previous solutions as only the right hand side changes from the solution at one time to the next. This method is validated against benchmark results: the impulsively started cylinder, with validation at low Reynolds number (Re = 550) and computational savings assessments at moderate Reynolds number (Re = 9500); then on a flat plate benchmark (Re = 1000). By improving the convergence behavior, the approach greatly reduces the computational cost of iterative penalization, at a moderate cost in memory overhead.
2009-01-01
Background There is a growing awareness that interaction between multiple genes play an important role in the risk of common, complex multi-factorial diseases. Many common diseases are affected by certain genotype combinations (associated with some genes and their interactions). The identification and characterization of these susceptibility genes and gene-gene interaction have been limited by small sample size and large number of potential interactions between genes. Several methods have been proposed to detect gene-gene interaction in a case control study. The penalized logistic regression (PLR), a variant of logistic regression with L2 regularization, is a parametric approach to detect gene-gene interaction. On the other hand, the Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free approach to detect genotype combinations associated with disease risk. Methods We compared the power of MDR and PLR for detecting two-way and three-way interactions in a case-control study through extensive simulations. We generated several interaction models with different magnitudes of interaction effect. For each model, we simulated 100 datasets, each with 200 cases and 200 controls and 20 SNPs. We considered a wide variety of models such as models with just main effects, models with only interaction effects or models with both main and interaction effects. We also compared the performance of MDR and PLR to detect gene-gene interaction associated with acute rejection(AR) in kidney transplant patients. Results In this paper, we have studied the power of MDR and PLR for detecting gene-gene interaction in a case-control study through extensive simulation. We have compared their performances for different two-way and three-way interaction models. We have studied the effect of different allele frequencies on these methods. We have also implemented their performance on a real dataset. As expected, none of these methods were consistently better for all
PRAMS: a systematic method for evaluating penal institutions under litigation.
Wills, Cheryl D
2007-01-01
Forensic psychiatrists serve as expert witnesses in litigation involving the impact of conditions of confinement, including mental health care delivery, on the emotional well-being of institutionalized persons. Experts review volumes of data before formulating opinions and preparing reports. The author has developed PRAMS, a method for systematically reviewing and presenting data during mental health litigation involving detention and corrections facilities. The PRAMS method divides the examination process into five stages: paper review, real-world view, aggravating circumstances, mitigating circumstances, and supplemental information. PRAMS provides the scaffolding on which a compelling picture of an institution's system of care may be constructed and disseminated in reports and during courtroom testimony. Also, PRAMS enhances the organization, analysis, publication, and presentation of salient findings, thereby coordinating the forensic psychiatrist's efforts to provide expert opinions regarding complex systems of mental health care.
Penalized estimation for proportional hazards models with current status data.
Lu, Minggen; Li, Chin-Shang
2017-09-05
We provide a simple and practical, yet flexible, penalized estimation method for a Cox proportional hazards model with current status data. We approximate the baseline cumulative hazard function by monotone B-splines and use a hybrid approach based on the Fisher-scoring algorithm and the isotonic regression to compute the penalized estimates. We show that the penalized estimator of the nonparametric component achieves the optimal rate of convergence under some smooth conditions and that the estimators of the regression parameters are asymptotically normal and efficient. Moreover, a simple variance estimation method is considered for inference on the regression parameters. We perform 2 extensive Monte Carlo studies to evaluate the finite-sample performance of the penalized approach and compare it with the 3 competing R packages: C1.coxph, intcox, and ICsurv. A goodness-of-fit test and model diagnostics are also discussed. The methodology is illustrated with 2 real applications. Copyright © 2017 John Wiley & Sons, Ltd.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning
Shang, Shang; Bai, Jing; Song, Xiaolei; Wang, Hongkai; Lau, Jaclyn
2007-01-01
Conjugate gradient method is verified to be efficient for nonlinear optimization problems of large-dimension data. In this paper, a penalized linear and nonlinear combined conjugate gradient method for the reconstruction of fluorescence molecular tomography (FMT) is presented. The algorithm combines the linear conjugate gradient method and the nonlinear conjugate gradient method together based on a restart strategy, in order to take advantage of the two kinds of conjugate gradient methods and compensate for the disadvantages. A quadratic penalty method is adopted to gain a nonnegative constraint and reduce the illposedness of the problem. Simulation studies show that the presented algorithm is accurate, stable, and fast. It has a better performance than the conventional conjugate gradient-based reconstruction algorithms. It offers an effective approach to reconstruct fluorochrome information for FMT.
Differentiating among penal states.
Lacey, Nicola
2010-12-01
This review article assesses Loïc Wacquant's contribution to debates on penality, focusing on his most recent book, Punishing the Poor: The Neoliberal Government of Social Insecurity (Wacquant 2009), while setting its argument in the context of his earlier Prisons of Poverty (1999). In particular, it draws on both historical and comparative methods to question whether Wacquant's conception of 'the penal state' is adequately differentiated for the purposes of building the explanatory account he proposes; about whether 'neo-liberalism' has, materially, the global influence which he ascribes to it; and about whether, therefore, the process of penal Americanization which he asserts in his recent writings is credible.
NASA Astrophysics Data System (ADS)
Kasimov, Nurlybek; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-11-01
A novel volume penalization method to enforce immersed boundary conditions in Navier-Stokes and Euler equations is presented. Previously, Brinkman penalization has been used to introduce solid obstacles modeled as porous media, although it is limited to Dirichlet-type conditions on velocity and temperature. This method builds upon Brinkman penalization by allowing Neumann conditions to be applied in a general fashion. Correct boundary conditions are achieved through characteristic propagation into the thin layer inside of the obstacle. Inward pointing characteristics ensure nonphysical solution inside the obstacle does not propagate outside to the fluid. Dirichlet boundary conditions are enforced similarly to Brinkman method. Penalization parameters act on a much faster timescale than the characteristic timescale of the flow. Main advantage of the method is systematic means of the error control. This talk is focused on the progress that was made towards the extension of the method to the 3D flows around irregular shapes. This work was supported by ONR MURI on Soil Blast Modeling.
A New Robust Method for Nonlinear Regression.
Tabatabai, M A; Kengwoung-Keumo, J J; Eby, W M; Bae, S; Manne, U; Fouad, M; Singh, K P
When outliers are present, the least squares method of nonlinear regression performs poorly. The main purpose of this paper is to provide a robust alternative technique to the Ordinary Least Squares nonlinear regression method. This new robust nonlinear regression method can provide accurate parameter estimates when outliers and/or influential observations are present. Real and simulated data for drug concentration and tumor size-metastasis are used to assess the performance of this new estimator. Monte Carlo simulations are performed to evaluate the robustness of our new method in comparison with the Ordinary Least Squares method. In simulated data with outliers, this new estimator of regression parameters seems to outperform the Ordinary Least Squares with respect to bias, mean squared errors, and mean estimated parameters. Two algorithms have been proposed. Additionally and for the sake of computational ease and illustration, a Mathematica program has been provided in the Appendix. The accuracy of our robust technique is superior to that of the Ordinary Least Squares. The robustness and simplicity of computations make this new technique more appropriate and useful tool for the analysis of nonlinear regressions.
A regression method for modelling geometric rates.
Bottai, Matteo
2015-09-18
The occurrence of an event of interest over time is often summarized by the incidence rate, defined as the average number of events per person-time. This type of rate applies to events that may occur repeatedly over time on any given subject, such as infections, and Poisson regression represents a natural regression method for modelling the effect of covariates on it. However, for events that can occur only once, such as death, the geometric rate may be a better summary measure. The geometric rate has long been utilized in demography for studying the growth of populations and in finance to compute compound interest on capital. This type of rate, however, is virtually unknown to medical research. This may be partly a consequence of the lack of a regression method for it. This paper describes a regression method for modelling the effect of covariates on the geometric rate. The described method is based on applying quantile regression to a transform of the time-to-event variable. The proposed method is used to analyze mortality in a randomized clinical trial and in an observational epidemiological study.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-06-01
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-01-01
Summary Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this paper, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler’s method, trapezoidal rule and Runge-Kutta method. A higher order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators (DBE) are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods to an HIV study to further illustrate the usefulness of the proposed approaches. PMID:22376200
NASA Astrophysics Data System (ADS)
Tauriello, Gerardo; Koumoutsakos, Petros
2015-02-01
We present a comparative study of penalization and phase field methods for the solution of the diffusion equation in complex geometries embedded using simple Cartesian meshes. The two methods have been widely employed to solve partial differential equations in complex and moving geometries for applications ranging from solid and fluid mechanics to biology and geophysics. Their popularity is largely due to their discretization on Cartesian meshes thus avoiding the need to create body-fitted grids. At the same time, there are questions regarding their accuracy and it appears that the use of each one is confined by disciplinary boundaries. Here, we compare penalization and phase field methods to handle problems with Neumann and Robin boundary conditions. We discuss extensions for Dirichlet boundary conditions and in turn compare with methods that have been explicitly designed to handle Dirichlet boundary conditions. The accuracy of all methods is analyzed using one and two dimensional benchmark problems such as the flow induced by an oscillating wall and by a cylinder performing rotary oscillations. This comparative study provides information to decide which methods to consider for a given application and their incorporation in broader computational frameworks. We demonstrate that phase field methods are more accurate than penalization methods on problems with Neumann boundary conditions and we present an error analysis explaining this result.
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2010-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape optimization for drag reduction in flows past linked bodies. This optimization relies on Adaptive Wavelet Collocation Method along with the Brinkman penalization technique and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid, while a level set approach is used to describe the body shape and the Brinkman volume penalization allows for an easy variation of flow geometry without requiring body-fitted meshes. We perform 2D simulations of linked bodies in order to investigate whether flat geometries are optimal for drag reduction. In order to accelerate the costly cost function evaluations we exploit the inherent parallelism of ES and we extend the CMA-ES implementation to a multi-host framework. This framework allows for an easy distribution of the cost function evaluations across several parallel architectures and it is not limited to only one computing facility. The resulting optimal shapes are geometrically consistent with the shapes that have been obtained in the pioneering wind tunnel experiments for drag reduction using Evolution Strategies by Ingo Rechenberg.
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J.T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylor–Couette flow, the z-pinch configuration, three dimensional Orszag–Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylor–Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Relationship between Multiple Regression and Selected Multivariable Methods.
ERIC Educational Resources Information Center
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
The Precision Efficacy Analysis for Regression Sample Size Method.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…
Calculation of Solar Radiation by Using Regression Methods
NASA Astrophysics Data System (ADS)
Kızıltan, Ö.; Şahin, M.
2016-04-01
In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-01-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-06-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
Yuki, Kenya; Asaoka, Ryo; Awano-Tanabe, Sachiko; Ono, Takeshi; Shiba, Daisuke; Murata, Hiroshi; Tsubota, Kazuo
2017-06-01
We predict the likelihood of a future motor vehicle collision (MVC) from visual function data, attitudes to driving, and past MVC history using the penalized support vector machine (pSVM) in subjects with primary open-angle glaucoma (POAG). Patients with POAG were screened prospectively for eligibility and 185 were analyzed in this study. Self-reported MVCs of all participants were recorded for 3 years from the baseline using a survey questionnaire every 12 months. A binocular integrated visual field (IVF) was calculated for each patient by merging a patient's monocular Humphrey Field Analyzer (HFA) visual fields (VFs). The IVF was divided into six regions, based on eccentricity and the right or left hemifield, and the average of the total deviation (TD) values in each of these six areas was calculated. Then, the future MVCs were predicted using various variables, including age, sex, 63 variables of 52 TD values, mean of the TD values, visual acuities (VAs), six sector average TDs with (predpenSVM_all) and without (predpenSVM_basic) the attitudes in driving, and also past MVC history, using the pSVM method, applying the leave-one-out cross validation. The relationship between predpenSVM_basic and the future MVC approached significance (odds ratio = 1.15, [0.99-1.29], P = 0.064, logistic regression). A significant relationship was observed between predpenSVM_all and the future MVC (odds ratio = 1.21, P = 0.0015). It was useful to predict future MVCs in patients with POAG using visual function metrics, patients' attitudes to driving, and past MVC history, using the pSVM. Careful consideration is needed when predicting future MVCs in POAG patients using visual function, and without driving attitude and MVC history.
Weighted scores method for regression models with dependent data.
Nikoloulopoulos, Aristidis K; Joe, Harry; Chaganty, N Rao
2011-10-01
There are copula-based statistical models in the literature for regression with dependent data such as clustered and longitudinal overdispersed counts, for which parameter estimation and inference are straightforward. For situations where the main interest is in the regression and other univariate parameters and not the dependence, we propose a "weighted scores method", which is based on weighting score functions of the univariate margins. The weight matrices are obtained initially fitting a discretized multivariate normal distribution, which admits a wide range of dependence. The general methodology is applied to negative binomial regression models. Asymptotic and small-sample efficiency calculations show that our method is robust and nearly as efficient as maximum likelihood for fully specified copula models. An illustrative example is given to show the use of our weighted scores method to analyze utilization of health care based on family characteristics.
Analyzing large datasets with bootstrap penalization.
Fang, Kuangnan; Ma, Shuangge
2017-03-01
Data with a large p (number of covariates) and/or a large n (sample size) are now commonly encountered. For many problems, regularization especially penalization is adopted for estimation and variable selection. The straightforward application of penalization to large datasets demands a "big computer" with high computational power. To improve computational feasibility, we develop bootstrap penalization, which dissects a big penalized estimation into a set of small ones, which can be executed in a highly parallel manner and each only demands a "small computer". The proposed approach takes different strategies for data with different characteristics. For data with a large p but a small to moderate n, covariates are first clustered into relatively homogeneous blocks. The proposed approach consists of two sequential steps. In each step and for each bootstrap sample, we select blocks of covariates and run penalization. The results from multiple bootstrap samples are pooled to generate the final estimate. For data with a large n but a small to moderate p, we bootstrap a small number of subjects, apply penalized estimation, and then conduct a weighted average over multiple bootstrap samples. For data with a large p and a large n, the natural marriage of the previous two methods is applied. Numerical studies, including simulations and data analysis, show that the proposed approach has computational and numerical advantages over the straightforward application of penalization. An R package has been developed to implement the proposed methods. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An adaptive regression method for infrared blind-pixel compensation
NASA Astrophysics Data System (ADS)
Chen, Suting; Meng, Hao; Pei, Tao; Zhang, Yanyan
2017-09-01
Blind pixel compensation is an ill-posed inverse problem of infrared imaging systems and image restoration. The performance of a blind pixel compensation algorithm depends on the accuracy of estimation for the underlying true infrared images. We propose an adaptive regression method (ARM) for blind pixel compensation that integrates the multi-scale framework with a regression model. A blind-pixel is restored by exploiting the intra-scale properties through the nonparametric regressive estimation and the inter-scale characteristics via parametric regression for continuous learning. Combining the respective strengths of a parametric model and a nonparametric model, ARM establishes a set of multi-scale blind-pixel compensation method to correct the non-uniformity based on key frame extraction. Therefore, it is essentially different from the traditional frameworks for blind pixel compensation which are based on filtering and interpolation. Experimental results on some challenging cases of blind compensation show that the proposed algorithm outperforms existing methods by a significant margin in both isolated blind restoration and clustered blind restoration.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
NASA Astrophysics Data System (ADS)
Chatelin, Robin; Poncet, Philippe
2014-07-01
Particle methods are very convenient to compute transport equations in fluid mechanics as their computational cost is linear and they are not limited by convection stability conditions. To achieve large 3D computations the method must be coupled to efficient algorithms for velocity computations, including a good treatment of non-homogeneities and complex moving geometries. The Penalization method enables to consider moving bodies interaction by adding a term in the conservation of momentum equation. This work introduces a new computational algorithm to solve implicitly in the same step the Penalization term and the Laplace operators, since explicit computations are limited by stability issues, especially at low Reynolds number. This computational algorithm is based on the Sherman-Morrison-Woodbury formula coupled to a GMRES iterative method to reduce the computations to a sequence of Poisson problems: this allows to formulate a penalized Poisson equation as a large perturbation of a standard Poisson, by means of algebraic relations. A direct consequence is the possibility to use fast solvers based on Fast Fourier Transforms for this problem with good efficiency from both the computational and the memory consumption point of views, since these solvers are recursive and they do not perform any matrix assembling. The resulting fluid mechanics computations are very fast and they consume a small amount of memory, compared to a reference solver or a linear system resolution. The present applications focus mainly on a coupling between transport equation and 3D Stokes equations, for studying biological organisms motion in a highly viscous flows with variable viscosity.
Monte Carlo methods for nonparametric regression with heteroscedastic measurement error.
McIntyre, Julie; Johnson, Brent A; Rappaport, Stephen M
2017-09-15
Nonparametric regression is a fundamental problem in statistics but challenging when the independent variable is measured with error. Among the first approaches was an extension of deconvoluting kernel density estimators for homescedastic measurement error. The main contribution of this article is to propose a new simulation-based nonparametric regression estimator for the heteroscedastic measurement error case. Similar to some earlier proposals, our estimator is built on principles underlying deconvoluting kernel density estimators. However, the proposed estimation procedure uses Monte Carlo methods for estimating nonlinear functions of a normal mean, which is different than any previous estimator. We show that the estimator has desirable operating characteristics in both large and small samples and apply the method to a study of benzene exposure in Chinese factory workers. © 2017, The International Biometric Society.
The extinction law from photometric data: linear regression methods
NASA Astrophysics Data System (ADS)
Ascenso, J.; Lombardi, M.; Lada, C. J.; Alves, J.
2012-04-01
Context. The properties of dust grains, in particular their size distribution, are expected to differ from the interstellar medium to the high-density regions within molecular clouds. Since the extinction at near-infrared wavelengths is caused by dust, the extinction law in cores should depart from that found in low-density environments if the dust grains have different properties. Aims: We explore methods to measure the near-infrared extinction law produced by dense material in molecular cloud cores from photometric data. Methods: Using controlled sets of synthetic and semi-synthetic data, we test several methods for linear regression applied to the specific problem of deriving the extinction law from photometric data. We cover the parameter space appropriate to this type of observations. Results: We find that many of the common linear-regression methods produce biased results when applied to the extinction law from photometric colors. We propose and validate a new method, LinES, as the most reliable for this effect. We explore the use of this method to detect whether or not the extinction law of a given reddened population has a break at some value of extinction. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (ESO programmes 069.C-0426 and 074.C-0728).
Cathodic protection design using the regression and correlation method
Niembro, A.M.; Ortiz, E.L.G.
1997-09-01
A computerized statistical method which calculates the current demand requirement based on potential measurements for cathodic protection systems is introduced. The method uses the regression and correlation analysis of statistical measurements of current and potentials of the piping network. This approach involves four steps: field potential measurements, statistical determination of the current required to achieve full protection, installation of more cathodic protection capacity with distributed anodes around the plant and examination of the protection potentials. The procedure is described and recommendations for the improvement of the existing and new cathodic protection systems are given.
Cheng, Lishui; Hobbs, Robert F; Sgouros, George; Frey, Eric C
2014-11-01
Three-dimensional (3D) dosimetry has the potential to provide better prediction of response of normal tissues and tumors and is based on 3D estimates of the activity distribution in the patient obtained from emission tomography. Dose-volume histograms (DVHs) are an important summary measure of 3D dosimetry and a widely used tool for treatment planning in radiation therapy. Accurate estimates of the radioactivity distribution in space and time are desirable for accurate 3D dosimetry. The purpose of this work was to develop and demonstrate the potential of penalized SPECT image reconstruction methods to improve DVHs estimates obtained from 3D dosimetry methods. The authors developed penalized image reconstruction methods, using maximum a posteriori (MAP) formalism, which intrinsically incorporate regularization in order to control noise and, unlike linear filters, are designed to retain sharp edges. Two priors were studied: one is a 3D hyperbolic prior, termed single-time MAP (STMAP), and the second is a 4D hyperbolic prior, termed cross-time MAP (CTMAP), using both the spatial and temporal information to control noise. The CTMAP method assumed perfect registration between the estimated activity distributions and projection datasets from the different time points. Accelerated and convergent algorithms were derived and implemented. A modified NURBS-based cardiac-torso phantom with a multicompartment kidney model and organ activities and parameters derived from clinical studies were used in a Monte Carlo simulation study to evaluate the methods. Cumulative dose-rate volume histograms (CDRVHs) and cumulative DVHs (CDVHs) obtained from the phantom and from SPECT images reconstructed with both the penalized algorithms and OS-EM were calculated and compared both qualitatively and quantitatively. The STMAP method was applied to patient data and CDRVHs obtained with STMAP and OS-EM were compared qualitatively. The results showed that the penalized algorithms substantially
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
Regularized discriminative spectral regression method for heterogeneous face matching.
Huang, Xiangsheng; Lei, Zhen; Fan, Mingyu; Wang, Xiao; Li, Stan Z
2013-01-01
Face recognition is confronted with situations in which face images are captured in various modalities, such as the visual modality, the near infrared modality, and the sketch modality. This is known as heterogeneous face recognition. To solve this problem, we propose a new method called discriminative spectral regression (DSR). The DSR maps heterogeneous face images into a common discriminative subspace in which robust classification can be achieved. In the proposed method, the subspace learning problem is transformed into a least squares problem. Different mappings should map heterogeneous images from the same class close to each other, while images from different classes should be separated as far as possible. To realize this, we introduce two novel regularization terms, which reflect the category relationships among data, into the least squares approach. Experiments conducted on two heterogeneous face databases validate the superiority of the proposed method over the previous methods.
A locally adaptive kernel regression method for facies delineation
NASA Astrophysics Data System (ADS)
Fernàndez-Garcia, D.; Barahona-Palomo, M.; Henri, C. V.; Sanchez-Vila, X.
2015-12-01
Facies delineation is defined as the separation of geological units with distinct intrinsic characteristics (grain size, hydraulic conductivity, mineralogical composition). A major challenge in this area stems from the fact that only a few scattered pieces of hydrogeological information are available to delineate geological facies. Several methods to delineate facies are available in the literature, ranging from those based only on existing hard data, to those including secondary data or external knowledge about sedimentological patterns. This paper describes a methodology to use kernel regression methods as an effective tool for facies delineation. The method uses both the spatial and the actual sampled values to produce, for each individual hard data point, a locally adaptive steering kernel function, self-adjusting the principal directions of the local anisotropic kernels to the direction of highest local spatial correlation. The method is shown to outperform the nearest neighbor classification method in a number of synthetic aquifers whenever the available number of hard data is small and randomly distributed in space. In the case of exhaustive sampling, the steering kernel regression method converges to the true solution. Simulations ran in a suite of synthetic examples are used to explore the selection of kernel parameters in typical field settings. It is shown that, in practice, a rule of thumb can be used to obtain suboptimal results. The performance of the method is demonstrated to significantly improve when external information regarding facies proportions is incorporated. Remarkably, the method allows for a reasonable reconstruction of the facies connectivity patterns, shown in terms of breakthrough curves performance.
Mapping urban environmental noise: a land use regression method.
Xie, Dan; Liu, Yi; Chen, Jining
2011-09-01
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua
2012-10-30
Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies.
Identifying gene-environment and gene-gene interactions using a progressive penalization approach.
Zhu, Ruoqing; Zhao, Hongyu; Ma, Shuangge
2014-05-01
In genomic studies, identifying important gene-environment and gene-gene interactions is a challenging problem. In this study, we adopt the statistical modeling approach, where interactions are represented by product terms in regression models. For the identification of important interactions, we adopt penalization, which has been used in many genomic studies. Straightforward application of penalization does not respect the "main effect, interaction" hierarchical structure. A few recently proposed methods respect this structure by applying constrained penalization. However, they demand very complicated computational algorithms and can only accommodate a small number of genomic measurements. We propose a computationally fast penalization method that can identify important gene-environment and gene-gene interactions and respect a strong hierarchical structure. The method takes a stagewise approach and progressively expands its optimization domain to account for possible hierarchical interactions. It is applicable to multiple data types and models. A coordinate descent method is utilized to produce the entire regularized solution path. Simulation study demonstrates the superior performance of the proposed method. We analyze a lung cancer prognosis study with gene expression measurements and identify important gene-environment interactions.
Identifying gene-environment and gene-gene interactions using a progressive penalization approach
Zhu, Ruoqing; Zhao, Hongyu; Ma, Shuangge
2015-01-01
In genomic studies, identifying important gene-environment and gene-gene interactions is a challenging problem. In this study, we adopt the statistical modeling approach, where interactions are represented by product terms in regression models. For the identification of important interactions, we adopt penalization, which has been used in many genomic studies. Straightforward application of penalization does not respect the “main effect, interaction” hierarchical structure. A few recently proposed methods respect this structure by applying constrained penalization. However, they demand very complicated computational algorithms and can only accommodate a small number of genomic measurements. We propose a computationally fast penalization method that can identify important gene-environment and gene-gene interactions and respect a strong hierarchical structure. The method takes a stagewise approach and progressively expands its optimization domain to account for possible hierarchical interactions. It is applicable to multiple data types and models. A coordinate descent method is utilized to produce the entire regularized solution path. Simulation study demonstrates the superior performance of the proposed method. We analyze a lung cancer prognosis study with gene expression measurements and identify important gene-environment interactions. PMID:24723356
Assessment of School Merit with Multiple Regression: Methods and Critique.
ERIC Educational Resources Information Center
Tate, Richard L.
1986-01-01
Regression-based adjustment of student outcomes for the assessment of the merit of schools is considered. First, the basics of causal modeling and multiple regression are briefly reviewed. Then, two common regression-based adjustment procedures are described, pointing out that the validity of the final assessments depends on: (1) the degree to…
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
van Houwelingen, Hans C; Putter, Hein
2015-04-01
By far the most popular model to obtain survival predictions for individual patients is the Cox model. The Cox model does not make any assumptions on the underlying hazard, but it relies heavily on the proportional hazards assumption. The most common ways to circumvent this robustness problem are 1) to categorize patients based on their prognostic risk score and to base predictions on Kaplan-Meier curves for the risk categories, or 2) to include interactions with the covariates and suitable functions of time. Robust estimators of the t(0)-year survival probabilities can also be obtained from a "stopped Cox" regression model, in which all observations are administratively censored at t(0). Other recent approaches to solve this robustness problem, originally proposed in the context of competing risks, are pseudo-values and direct binomial regression, based on unbiased estimating equations. In this paper stopped Cox regression is compared with these direct approaches. This is done by means of a simulation study to assess the biases of the different approaches and an analysis of breast cancer data to get some feeling for the performance in practice. The tentative conclusion is that stopped Cox and direct models agree well if the follow-up is not too long. There are larger differences for long-term follow-up data. There stopped Cox might be more efficient, but less robust.
Wilcox, Rand R
2013-04-01
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
A novel generalized ridge regression method for quantitative genetics.
Shen, Xia; Alam, Moudud; Fikse, Freddy; Rönnegård, Lars
2013-04-01
As the molecular marker density grows, there is a strong need in both genome-wide association studies and genomic selection to fit models with a large number of parameters. Here we present a computationally efficient generalized ridge regression (RR) algorithm for situations in which the number of parameters largely exceeds the number of observations. The computationally demanding parts of the method depend mainly on the number of observations and not the number of parameters. The algorithm was implemented in the R package bigRR based on the previously developed package hglm. Using such an approach, a heteroscedastic effects model (HEM) was also developed, implemented, and tested. The efficiency for different data sizes were evaluated via simulation. The method was tested for a bacteria-hypersensitive trait in a publicly available Arabidopsis data set including 84 inbred lines and 216,130 SNPs. The computation of all the SNP effects required <10 sec using a single 2.7-GHz core. The advantage in run time makes permutation test feasible for such a whole-genome model, so that a genome-wide significance threshold can be obtained. HEM was found to be more robust than ordinary RR (a.k.a. SNP-best linear unbiased prediction) in terms of QTL mapping, because SNP-specific shrinkage was applied instead of a common shrinkage. The proposed algorithm was also assessed for genomic evaluation and was shown to give better predictions than ordinary RR.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Kwak, Il-Youp; Moore, Candace R; Spalding, Edgar P; Broman, Karl W
2014-08-01
Most statistical methods for quantitative trait loci (QTL) mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While methods exist for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl. Copyright © 2014 by the Genetics Society of America.
NASA Astrophysics Data System (ADS)
Haddad, Khaled; Rahman, Ataur
2012-04-01
SummaryIn this article, an approach using Bayesian Generalised Least Squares (BGLS) regression in a region-of-influence (ROI) framework is proposed for regional flood frequency analysis (RFFA) for ungauged catchments. Using the data from 399 catchments in eastern Australia, the BGLS-ROI is constructed to regionalise the flood quantiles (Quantile Regression Technique (QRT)) and the first three moments of the log-Pearson type 3 (LP3) distribution (Parameter Regression Technique (PRT)). This scheme firstly develops a fixed region model to select the best set of predictor variables for use in the subsequent regression analyses using an approach that minimises the model error variance while also satisfying a number of statistical selection criteria. The identified optimal regression equation is then used in the ROI experiment where the ROI is chosen for a site in question as the region that minimises the predictive uncertainty. To evaluate the overall performances of the quantiles estimated by the QRT and PRT, a one-at-a-time cross-validation procedure is applied. Results of the proposed method indicate that both the QRT and PRT in a BGLS-ROI framework lead to more accurate and reliable estimates of flood quantiles and moments of the LP3 distribution when compared to a fixed region approach. Also the BGLS-ROI can deal reasonably well with the heterogeneity in Australian catchments as evidenced by the regression diagnostics. Based on the evaluation statistics it was found that both BGLS-QRT and PRT-ROI perform similarly well, which suggests that the PRT is a viable alternative to QRT in RFFA. The RFFA methods developed in this paper is based on the database available in eastern Australia. It is expected that availability of a more comprehensive database (in terms of both quality and quantity) will further improve the predictive performance of both the fixed and ROI based RFFA methods presented in this study, which however needs to be investigated in future when such a
L1-Penalized N-way PLS for subset of electrodes selection in BCI experiments
NASA Astrophysics Data System (ADS)
Eliseyev, Andrey; Moro, Cecile; Faber, Jean; Wyss, Alexander; Torres, Napoleon; Mestais, Corinne; Benabid, Alim Louis; Aksenova, Tetiana
2012-08-01
Recently, the N-way partial least squares (NPLS) approach was reported as an effective tool for neuronal signal decoding and brain-computer interface (BCI) system calibration. This method simultaneously analyzes data in several domains. It combines the projection of a data tensor to a low dimensional space with linear regression. In this paper the L1-Penalized NPLS is proposed for sparse BCI system calibration, allowing uniting the projection technique with an effective selection of subset of features. The L1-Penalized NPLS was applied for the binary self-paced BCI system calibration, providing selection of electrodes subset. Our BCI system is designed for animal research, in particular for research in non-human primates.
Linear Regression in High Dimension and/or for Correlated Inputs
NASA Astrophysics Data System (ADS)
Jacques, J.; Fraix-Burnet, D.
2014-12-01
Ordinary least square is the common way to estimate linear regression models. When inputs are correlated or when they are too numerous, regression methods using derived inputs directions or shrinkage methods can be efficient alternatives. Methods using derived inputs directions build new uncorrelated variables as linear combination of the initial inputs, whereas shrinkage methods introduce regularization and variable selection by penalizing the usual least square criterion. Both kinds of methods are presented and illustrated thanks to the R software on an astronomical dataset.
Hypothesis Testing Using Factor Score Regression: A Comparison of Four Methods
ERIC Educational Resources Information Center
Devlieger, Ines; Mayer, Axel; Rosseel, Yves
2016-01-01
In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and…
Hypothesis Testing Using Factor Score Regression: A Comparison of Four Methods
ERIC Educational Resources Information Center
Devlieger, Ines; Mayer, Axel; Rosseel, Yves
2016-01-01
In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and…
Mikhal, Julia; Geurts, Bernard J
2013-12-01
A volume-penalizing immersed boundary method is presented for the simulation of laminar incompressible flow inside geometrically complex blood vessels in the human brain. We concentrate on cerebral aneurysms and compute flow in curved brain vessels with and without spherical aneurysm cavities attached. We approximate blood as an incompressible Newtonian fluid and simulate the flow with the use of a skew-symmetric finite-volume discretization and explicit time-stepping. A key element of the immersed boundary method is the so-called masking function. This is a binary function with which we identify at any location in the domain whether it is 'solid' or 'fluid', allowing to represent objects immersed in a Cartesian grid. We compare three definitions of the masking function for geometries that are non-aligned with the grid. In each case a 'staircase' representation is used in which a grid cell is either 'solid' or 'fluid'. Reliable findings are obtained with our immersed boundary method, even at fairly coarse meshes with about 16 grid cells across a velocity profile. The validation of the immersed boundary method is provided on the basis of classical Poiseuille flow in a cylindrical pipe. We obtain first order convergence for the velocity and the shear stress, reflecting the fact that in our approach the solid-fluid interface is localized with an accuracy on the order of a grid cell. Simulations for curved vessels and aneurysms are done for different flow regimes, characterized by different values of the Reynolds number (Re). The validation is performed for laminar flow at Re = 250, while the flow in more complex geometries is studied at Re = 100 and Re = 250, as suggested by physiological conditions pertaining to flow of blood in the circle of Willis.
Widely Linear Complex-Valued Kernel Methods for Regression
NASA Astrophysics Data System (ADS)
Boloix-Tortosa, Rafael; Murillo-Fuentes, Juan Jose; Santos, Irene; Perez-Cruz, Fernando
2017-10-01
Usually, complex-valued RKHS are presented as an straightforward application of the real-valued case. In this paper we prove that this procedure yields a limited solution for regression. We show that another kernel, here denoted as pseudo kernel, is needed to learn any function in complex-valued fields. Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS). When the pseudo-kernel cancels, WRKHS reduces to complex-valued RKHS of previous approaches. We address the kernel and pseudo-kernel design, paying attention to the kernel and the pseudo-kernel being complex-valued. In the experiments included we report remarkable improvements in simple scenarios where real a imaginary parts have different similitude relations for given inputs or cases where real and imaginary parts are correlated. In the context of these novel results we revisit the problem of non-linear channel equalization, to show that the WRKHS helps to design more efficient solutions.
Permissible performance limits of regression analyses in method comparisons.
Haeckel, Rainer; Wosniok, Werner; Al Shareef, Nadera
2011-11-01
Method comparisons are indispensable tools for the extensive validation of analytic procedures. Laboratories often only want to know whether an established procedure (x-method) can be replaced by another one (y-method) without interfering with diagnostic purposes. Then split patients' samples are analyzed more or less simultaneously with both procedures designed to measure the same quantity. The measured values are usually presented graphically as a scatter or difference plots. The two methods are considered to be equivalent (comparable) if the data pairs scatter around the line of equality (x=y line) within permissible equivalence lines. It is proposed to derive these limits of permissible imprecision limits which are based on false-positive error rates. If all data pairs are within the limits, both methods lead to comparable false error rates. If one or more data pairs are outside the permissible equivalence limits, the x-method cannot simply be replaced by the y-method and further studies are required. The discordance may be caused either by aberrant values (outliers), non-linearity, bias or a higher variation of e.g., the y-values. The spread around the line of best fit can detect possible interferences if more than 1% of the data pairs are outside permissible spread lines in a scatter plot. Because bias between methods and imprecision can be inter-related, both require specific examinations for their identification.
Risk prediction with machine learning and regression methods.
Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben
2014-07-01
This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler.
The Robustness of Regression and Substitution by Mean Methods in Handling Missing Values.
ERIC Educational Resources Information Center
Kaiser, Javaid
There are times in survey research when missing values need to be estimated. The robustness of four variations of regression and substitution by mean methods was examined using a 3x3x4 factorial design. The regression variations included in the study were: (1) regression using a single best predictor; (2) two best predictors; (3) all available…
Gaussian Process Regression Plus Method for Localization Reliability Improvement
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-01-01
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions. PMID:27483276
Gaussian Process Regression Plus Method for Localization Reliability Improvement.
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-07-29
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions.
Volume penalization to model falling leaves
NASA Astrophysics Data System (ADS)
Kolomenskiy, Dmitry; Schneider, Kai
2007-11-01
Numerical modeling of solid bodies moving through viscous incompressible fluid is considered. The 2D Navier-Stokes equations, written in the vorticity-streamfunction formulation, are discretized using a Fourier pseudo-spectral scheme with adaptive time-stepping. Solid obstacles of arbitrary shape are taken into account using the volume penalization method. Time- dependent penalization is implemented, making the method capable of solving problems where the obstacle follows an arbitrary motion. Numerical simulations of falling leaves are performed, using the above model supplemented by the discretized ODEs describing the motion of a solid body subjected to external forces and moments. Various regimes of the free fall are explored, depending on the physical parameters and initial conditions. The influence of the Reynolds number on the transition between fluttering and tumbling is investigated, showing the stabilizing effect of viscosity.
Uh, Hae-Won; Hartgers, Franca C; Yazdanbakhsh, Maria; Houwing-Duistermaat, Jeanine J
2008-10-17
The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects), and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary) linear regression. The second aim is to compare these methods by simulation studies based on real data. We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best. Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
NASA Astrophysics Data System (ADS)
Afifah, Rawyanil; Andriyana, Yudhie; Jaya, I. G. N. Mindra
2017-03-01
Geographically Weighted Regression (GWR) is a development of an Ordinary Least Squares (OLS) regression which is quite effective in estimating spatial non-stationary data. On the GWR models, regression parameters are generated locally, each observation has a unique regression coefficient. Parameter estimation process in GWR uses Weighted Least Squares (WLS). But when there are outliers in the data, the parameter estimation process with WLS produces estimators which are not efficient. Hence, this study uses a robust method called Least Absolute Deviation (LAD), to estimate the parameters of GWR model in the case of poverty in Java Island. This study concludes that GWR model with LAD method has a better performance.
Chen, Huaihou; Paik, Myunghee Cho; Choi, H. Alex
2014-01-01
Multilevel functional data is collected in many biomedical studies. For example, in a study of the effect of Nimodipine on patients with subarachnoid hemorrhage (SAH), patients underwent multiple 4-hour treatment cycles. Within each treatment cycle, subjects’ vital signs were reported every 10 minutes. This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated generalized outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a variance estimator robust to misspecification of correlation structure. We investigate the large sample properties of the penalized spline GEE estimator with multilevel continuous data and show that the asymptotics falls into two categories. In the small knots scenario, the estimated mean function is asymptotically efficient when the true correlation function is used and the asymptotic bias does not depend on the working correlation matrix. In the large knots scenario, both the asymptotic bias and variance depend on the working correlation. We propose a new method to select the smoothing parameter for penalized spline GEE based on an estimate of the asymptotic mean squared error (MSE). We conduct extensive simulation studies to examine property of the proposed estimator under different correlation structures and sensitivity of the variance estimation to the choice
Chen, Huaihou; Wang, Yuanjia; Paik, Myunghee Cho; Choi, H Alex
2013-10-01
Multilevel functional data is collected in many biomedical studies. For example, in a study of the effect of Nimodipine on patients with subarachnoid hemorrhage (SAH), patients underwent multiple 4-hour treatment cycles. Within each treatment cycle, subjects' vital signs were reported every 10 minutes. This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated generalized outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a variance estimator robust to misspecification of correlation structure. We investigate the large sample properties of the penalized spline GEE estimator with multilevel continuous data and show that the asymptotics falls into two categories. In the small knots scenario, the estimated mean function is asymptotically efficient when the true correlation function is used and the asymptotic bias does not depend on the working correlation matrix. In the large knots scenario, both the asymptotic bias and variance depend on the working correlation. We propose a new method to select the smoothing parameter for penalized spline GEE based on an estimate of the asymptotic mean squared error (MSE). We conduct extensive simulation studies to examine property of the proposed estimator under different correlation structures and sensitivity of the variance estimation to the choice
Interaction Models for Functional Regression
USSET, JOSEPH; STAICU, ANA-MARIA; MAITY, ARNAB
2015-01-01
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data. PMID:26744549
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.
Equivalencing MAT and GRE Scores Using Simple Linear Transformation and Regression Methods.
ERIC Educational Resources Information Center
Kagan, Dona M.; Stock, William A.
1980-01-01
Graduate Record Examination and Miller Analogies Test scores were equated using linear transformation and regression methods. Standard deviations of regression equivalence scores were consistently smaller than those actually obtained in the sample, whereas standard deviations of linear equivalence scores were the same as those in the sample.…
Broadband mode in proton-precession magnetometers with signal processing regression methods
NASA Astrophysics Data System (ADS)
Denisov, Alexey Y.; Sapunov, Vladimir A.; Rubinstein, Boris
2014-05-01
The choice of the signal processing method may improve characteristics of the measuring device. We consider the measurement error of signal processing regression methods for a quasi-harmonic signal generated in a frequency selective device. The results are applied to analyze the difference between the simple period meter processing and regression algorithms using measurement cycle signal data in proton-precession magnetometers. Dependences of the measurement error on the sensor quality factor and frequency of nuclear precession are obtained. It is shown that regression methods considerably widen the registration bandwidth and relax the requirements on the magnetometer hardware, and thus affect the optimization criteria of the registration system.
How to use linear regression and correlation in quantitative method comparison studies.
Twomey, P J; Kroll, M H
2008-04-01
Linear regression methods try to determine the best linear relationship between data points while correlation coefficients assess the association (as opposed to agreement) between the two methods. Linear regression and correlation play an important part in the interpretation of quantitative method comparison studies. Their major strength is that they are widely known and as a result both are employed in the vast majority of method comparison studies. While previously performed by hand, the availability of statistical packages means that regression analysis is usually performed by software packages including MS Excel, with or without the software programe Analyze-it as well as by other software packages. Such techniques need to be employed in a way that compares the agreement between the two methods examined and more importantly, because we are dealing with individual patients, whether the degree of agreement is clinically acceptable. Despite their use for many years, there is a lot of ignorance about the validity as well as the pros and cons of linear regression and correlation techniques. This review article describes the types of linear regression and regression (parametric and non-parametric methods) and the necessary general and specific requirements. The selection of the type of regression depends on where one has been trained, the tradition of the laboratory and the availability of adequate software.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Gan, Wei; Liu, Xuemin; Sun, Jing
2015-02-01
This paper presents a method of regression evaluation index intelligent filter method (REIFM) for quick optimization of chromatographic separation conditions. The hierarchical chromatography response function was used as the chromatography-optimization index. The regression model was established by orthogonal regression design. The chromatography-optimization index was filtered by the intelligent filter program, and the optimization of the separation conditions was obtained. The experimental results showed that the average relative deviation between the experimental values and the predicted values was 0. 18% at the optimum and the optimization results were satisfactory.
Multistage regression, a novel method for making better predictions from your efficacy data.
Cleophas, Eugene P; Cleophas, Ton J
2014-01-01
Multistage regression is rarely used in therapeutic research, despite the multistage pattern of many medical conditions. Using an example of an efficacy study of a new laxative, path analysis and the 2-stage least square method were compared with standard linear regression. Standard linear regression showed a significant effect of the predictor "noncompliance" on drug efficacy at P=0.005. However, after adjustment for the covariate "counseling," the magnitude of the regression coefficient fell from 0.70 to 0.29, and the P value rose to 0.10. Path analysis was valid, given the significant correlation between the two predictors (P=0.024) and produced an increase of the regression coefficient between "noncompliance" and "drug efficacy" by 60.0%. The 2-stage least squares method, using counseling as instrumental variable, produced, similarly, an increase of the overall correlation by 66.7%. A bivariate path analysis with "quality of life" as the second outcome variable increased the magnitude of the path statistic further by 47.1%, and, thus, enabled to make still better use of the predicting variables. We conclude that (1) multistage regression methods, as used in the present article produced much better predictions about the drug efficacy than did standard linear regression; (2) the inclusion of additional outcome variables enables to make still better use of the predicting variables; (3) multistage regression must always be preceded by usual linear regression to exclude weak predictors. We recommend that researchers analyzing efficacy data of new treatments more often apply multistage regression.
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Interquantile Shrinkage and Variable Selection in Quantile Regression
Jiang, Liewen; Bondell, Howard D.; Wang, Huixia Judy
2014-01-01
Examination of multiple conditional quantile functions provides a comprehensive view of the relationship between the response and covariates. In situations where quantile slope coefficients share some common features, estimation efficiency and model interpretability can be improved by utilizing such commonality across quantiles. Furthermore, elimination of irrelevant predictors will also aid in estimation and interpretation. These motivations lead to the development of two penalization methods, which can identify the interquantile commonality and nonzero quantile coefficients simultaneously. The developed methods are based on a fused penalty that encourages sparsity of both quantile coefficients and interquantile slope differences. The oracle properties of the proposed penalization methods are established. Through numerical investigations, it is demonstrated that the proposed methods lead to simpler model structure and higher estimation efficiency than the traditional quantile regression estimation. PMID:24653545
Motulsky, Harvey J; Brown, Ronald E
2006-01-01
Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Assessment of CSO loads--based on UVIVIS-spectroscopy by means of different regression methods.
Hochedlinger, M; Kainz, H; Rauch, W
2006-01-01
The use of UV/VIS-spectroscopy for water quality measurements is based on the solution of the correlation between the surrogate parameter absorbance and the resulting equivalence parameters. The coherence of absorbance and equivalence parameters (CODtot, CODsol, TSS) is solved in this paper with different regression methods. The correlation of absorbance and concentrations are analysed based on linear regression methods, model tree regressions, multivariate regression methods and support vector machines using sequential minimal optimisation algorithm. For this purpose the regression methods are calibrated on three 24hours measurement campaigns of a combined sewer measurement station situated in the combined sewer overflow chamber in Graz (Austria). The online measurement station has been conveying data for more than 2 1/2 years up to now. Finally, the load calculation based on the different regression methods and its comparison demonstrate that an apparently complex model does not inevitably lead to accurate concentration values due to possible model overfitting. Hence, the paper points out the possibilities and the drawbacks of spectroscopy measuring in sewers and the arising concentration values.
Non-Concave Penalized Likelihood with NP-Dimensionality
Fan, Jianqing; Lv, Jinchi
2011-01-01
Penalized likelihood methods are fundamental to ultra-high dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of Non-Polynomial (NP) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a long-standing gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1-penalty, which is a convex function at the boundary of the class of folded-concave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis. PMID:22287795
An NCME Instructional Module on Data Mining Methods for Classification and Regression
ERIC Educational Resources Information Center
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
An NCME Instructional Module on Data Mining Methods for Classification and Regression
ERIC Educational Resources Information Center
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.
1992-01-01
Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
ERIC Educational Resources Information Center
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
ERIC Educational Resources Information Center
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
A comparison of several methods of solving nonlinear regression groundwater flow problems.
Cooley, R.L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems.-from Author
[Criminalistic and penal problems with "dyadic deaths"].
Kaliszczak, Paweł; Kunz, Jerzy; Bolechała, Filip
2002-01-01
This paper is a supplement to the article "Medico legal problems of dyadic death" elaborated by the same authors. Recalling the cases presented there. It is also an attempt to present the basic criminalistic, penal and definitional problems of dyadic death called also postagressional suicide. Criminalistic problems of dyadic death were presented in view of widely known "rule of seven golden questions"--what?, where?, when?, how?, why?, what method? and who? Criminalistic analysis of cases makes some differences in conclusions but it seemed interesting to match both--criminalistc and forensic points of views to the presented material.
ERIC Educational Resources Information Center
And Others; Young, Forrest W.
1976-01-01
A method is discussed which extends canonical regression analysis to the situation where the variables may be measured as nominal, ordinal, or interval, and where they may be either continuous or discrete. The method, which is purely descriptive, uses an alternating least squares algorithm and is robust. Examples are provided. (Author/JKS)
A Maximum Likelihood Method for Latent Class Regression Involving a Censored Dependent Variable.
ERIC Educational Resources Information Center
Jedidi, Kamel; And Others
1993-01-01
A method is proposed to simultaneously estimate regression functions and subject membership in "k" latent classes or groups given a censored dependent variable for a cross-section of subjects. Maximum likelihood estimates are obtained using an EM algorithm. The method is illustrated through a consumer psychology application. (SLD)
A Maximum Likelihood Method for Latent Class Regression Involving a Censored Dependent Variable.
ERIC Educational Resources Information Center
Jedidi, Kamel; And Others
1993-01-01
A method is proposed to simultaneously estimate regression functions and subject membership in "k" latent classes or groups given a censored dependent variable for a cross-section of subjects. Maximum likelihood estimates are obtained using an EM algorithm. The method is illustrated through a consumer psychology application. (SLD)
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Multiple regression methods show great potential for rare variant association tests.
Xu, ChangJiang; Ladouceur, Martin; Dastani, Zari; Richards, J Brent; Ciampi, Antonio; Greenwood, Celia M T
2012-01-01
The investigation of associations between rare genetic variants and diseases or phenotypes has two goals. Firstly, the identification of which genes or genomic regions are associated, and secondly, discrimination of associated variants from background noise within each region. Over the last few years, many new methods have been developed which associate genomic regions with phenotypes. However, classical methods for high-dimensional data have received little attention. Here we investigate whether several classical statistical methods for high-dimensional data: ridge regression (RR), principal components regression (PCR), partial least squares regression (PLS), a sparse version of PLS (SPLS), and the LASSO are able to detect associations with rare genetic variants. These approaches have been extensively used in statistics to identify the true associations in data sets containing many predictor variables. Using genetic variants identified in three genes that were Sanger sequenced in 1998 individuals, we simulated continuous phenotypes under several different models, and we show that these feature selection and feature extraction methods can substantially outperform several popular methods for rare variant analysis. Furthermore, these approaches can identify which variants are contributing most to the model fit, and therefore both goals of rare variant analysis can be achieved simultaneously with the use of regression regularization methods. These methods are briefly illustrated with an analysis of adiponectin levels and variants in the ADIPOQ gene.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
An improved partial least-squares regression method for Raman spectroscopy
NASA Astrophysics Data System (ADS)
Momenpour Tehran Monfared, Ali; Anis, Hanan
2017-10-01
It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
NASA Astrophysics Data System (ADS)
Zanariah Satari, Siti; Di, Nur Faraidah Muhammad; Zakaria, Roslinazairimah
2017-09-01
Two agglomerative hierarchical clustering algorithms for identifying multiple outliers in circular regression model have been developed in this study. The agglomerative hierarchical clustering algorithm starts with every single data in a single cluster and it continues to merge with the closest pair of clusters according to some similarity criterion until all the data are grouped in one cluster. The single-linkage method is one of the simplest agglomerative hierarchical methods that is commonly used to detect outlier. In this study, we compared the performance of single-linkage method with another agglomerative hierarchical method, namely average linkage for detecting outlier in circular regression model. The performances of both methods were examined via simulation studies by measuring their “success” probability, masking effect, and swamping effect with different number of sample sizes and level of contaminations. The results show that the single-linkage method performs very well in detecting the multiple outliers with lower masking and swamping effects.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
An Empirical Likelihood Method for Semiparametric Linear Regression with Right Censored Data
Fang, Kai-Tai; Li, Gang; Lu, Xuyang; Qin, Hong
2013-01-01
This paper develops a new empirical likelihood method for semiparametric linear regression with a completely unknown error distribution and right censored survival data. The method is based on the Buckley-James (1979) estimating equation. It inherits some appealing properties of the complete data empirical likelihood method. For example, it does not require variance estimation which is problematic for the Buckley-James estimator. We also extend our method to incorporate auxiliary information. We compare our method with the synthetic data empirical likelihood of Li and Wang (2003) using simulations. We also illustrate our method using Stanford heart transplantation data. PMID:23573169
Improved random-starting method for the EM algorithm for finite mixtures of regressions.
Schepers, Jan
2015-03-01
Two methods for generating random starting values for the expectation maximization (EM) algorithm are compared in terms of yielding maximum likelihood parameter estimates in finite mixtures of regressions. One of these methods is ubiquitous in applications of finite mixture regression, whereas the other method is an alternative that appears not to have been used so far. The two methods are compared in two simulation studies and on an illustrative data set. The results show that the alternative method yields solutions with likelihood values at least as high as, and often higher than, those returned by the standard method. Moreover, analyses of the illustrative data set show that the results obtained by the two methods may differ considerably with regard to some of the substantive conclusions. The results reported in this article indicate that in applications of finite mixture regression, consideration should be given to the type of mechanism chosen to generate random starting values for the EM algorithm. In order to facilitate the use of the proposed alternative method, an R function implementing the approach is provided in the Appendix of the article.
Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming
2011-06-01
Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Simulation of Experimental Parameters of RC Beams by Employing the Polynomial Regression Method
NASA Astrophysics Data System (ADS)
Sayin, B.; Sevgen, S.; Samli, R.
2016-07-01
A numerical model based on the method polynomial regression is developed to simulate the mechanical behavior of reinforced concrete beams strengthened with a carbon-fiber-reinforced polymer and subjected to four-point bending. The results obtained are in good agreement with data of laboratory tests.
Three Reasons Why Stepwise Regression Methods Should Not Be Used by Researchers.
ERIC Educational Resources Information Center
Welge, Patricia
L. A. Marascuilo and R. C. Serlin (1988) note that stepwise regression is a method used frequently in social science research. C. Huberty (1989) characterizes such applications as being "common". In support of this latter statement, a review of dissertations by B. Thompson (1988) demonstrated that dissertation students frequently use…
Factor Regression Analysis: A New Method for Weighting Predictors. Final Report.
ERIC Educational Resources Information Center
Curtis, Ervin W.
The optimum weighting of variables to predict a dependent-criterion variable is an important problem in nearly all of the social and natural sciences. Although the predominant method, multiple regression analysis (MR), yields optimum weights for the sample at hand, these weights are not generally optimum in the population from which the sample was…
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model. 2004 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
NASA Astrophysics Data System (ADS)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
Penalized Likelihood for General Semi-Parametric Regression Models.
1985-05-01
should be stressed that q, while it may be somewhat less than n, will still be ’large’, and parametric estimation of £ will not be appropriate...Partial spline models for the semi- parametric estimation of functions of several variables, in Statistical Analysis of Time Series, Tokyo: Institute of
Sajobi, Tolulope T.; Zhang, Yukun; Menon, Bijoy K.; Goyal, Mayank; Demchuk, Andrew M; Broderick, Joseph P.; Hill, Michael D.
2015-01-01
Background and Purpose Ordinal outcomes, such as modified Rankin scale (mRS), are the standard primary endpoints in acute stroke trials. Regression models for assessing treatment efficacy after adjusting for baseline covariates have been developed for continuous, binary, or ordinal endpoints. There has been no consensus on the best choice of method for analyzing these data. Methods We compared several regression models for assessing treatment efficacy in acute stroke trials using existing datasets from IMS III and PROACT 2 trials. Patients with baseline non-contrast CT ASPECTS score > 5, baseline CT angiography (CTA) or conventional angiogram showing an intracranial internal carotid artery (ICA) or middle cerebral artery trunk (M-1) occlusion, adequate collateral circulation shown on CTA, and treatment times of non-contrast CT to groin puncture of 90 minutes or less, were included. Monte Carlo techniques were used to compare the statistical power of these regression models under a variety of simulated data analytic scenarios. Results Binary logistic regression showed greater power when the treatment is predicted to show evidence of benefit on one end of the mRS with no other gains across other levels of the scale. Proportional odds regression showed greater power when the treatment is predicted to show evidence of improvement on both ends of the mRS. Conclusions The mRS distribution for both treatment and control groups influences the power of the investigated statistical models to assess treatment efficacy. A careful evaluation of the expected outcome distribution across the mRS scale is required to determine the best choice of primary analysis. PMID:26022639
NASA Astrophysics Data System (ADS)
Zheng, Jun; Shao, Xinyu; Gao, Liang; Jiang, Ping; Qiu, Haobo
2015-06-01
Engineering design, especially for complex engineering systems, is usually a time-consuming process involving computation-intensive computer-based simulation and analysis methods. A difference mapping method using least square support vector regression is developed in this work, as a special metamodelling methodology that includes variable-fidelity data, to replace the computationally expensive computer codes. A general difference mapping framework is proposed where a surrogate base is first created, then the approximation is gained by a mapping the difference between the base and the real high-fidelity response surface. The least square support vector regression is adopted to accomplish the mapping. Two different sampling strategies, nested and non-nested design of experiments, are conducted to explore their respective effects on modelling accuracy. Different sample sizes and three approximation performance measures of accuracy are considered.
Liu, Wenya; Li, Qi
2017-01-01
Using the spectrum data for quality prediction always suffers from noise and colinearity, so variable selection method plays an important role to deal with spectrum data. An efficient elastic net with regression coefficients method (Enet-BETA) is proposed to select the significant variables of the spectrum data in this paper. The proposed Enet-BETA method can not only select important variables to make the quality easy to interpret, but also can improve the stability and feasibility of the built model. Enet-BETA method is not prone to overfitting because of the reduction of redundant variables realized by elastic net method. Hypothesis testing is used to further simplify the model and provide a better insight into the nature of process. The experimental results prove that the proposed Enet-BETA method outperforms the other methods in terms of prediction performance and model interpretation.
An Efficient Elastic Net with Regression Coefficients Method for Variable Selection of Spectrum Data
Liu, Wenya; Li, Qi
2017-01-01
Using the spectrum data for quality prediction always suffers from noise and colinearity, so variable selection method plays an important role to deal with spectrum data. An efficient elastic net with regression coefficients method (Enet-BETA) is proposed to select the significant variables of the spectrum data in this paper. The proposed Enet-BETA method can not only select important variables to make the quality easy to interpret, but also can improve the stability and feasibility of the built model. Enet-BETA method is not prone to overfitting because of the reduction of redundant variables realized by elastic net method. Hypothesis testing is used to further simplify the model and provide a better insight into the nature of process. The experimental results prove that the proposed Enet-BETA method outperforms the other methods in terms of prediction performance and model interpretation. PMID:28152003
NASA Astrophysics Data System (ADS)
Xu, Xiaohong; Chen, Yu; Jia, Haiwei
2009-07-01
The paper study the relation between Interest rate and Inflation rate, we use the Stepwise Regression Method to build the math model about the relation between Interest rate and Inflation rate. And the model has passed the significance test, and we use the model to discuss the influence on social economy through adjust Deposit rate, so we can provide a lot of theory proof for government to draw policy.
[ITT with the penal direction].
Oustric, Stéphane; Grill, Stéphane; Telmon, Norbert
2009-10-20
to evaluate the knowledge and the practice of the general practitioners of the French region of Midi Pyrénées evaluation fixing of the total disablement of work (ITT). inquire of practice by anonymous self-administered questionnaire near 500 drawn general practitioners of the area Midi the Pyrenees to the fate. This questionnaire comprised four parts (profile of the doctor, self-assessment of the occupation as regards drafting of certificate of aggravated assault and fixing of the ITT, questions of theoretical knowledge concerning the ITT, evaluation of knowledge in analysis of clinical situations). 266 questionnaires out of 500 (53.2%) answered and exploitable between March and July 2006. The sample of doctors is representative of the medical population of the area Midi Pyrénées; 91.8% of the doctors write the certificates of aggravated assault and 80% determine the total disablement of work; 20% know the significance of initials ITT to the penal direction of the term, 94% know the legal consequences of the duration of the penal ITT, 87% in the case of know the rule of the "8 days" voluntary violences. In 24% of the errors, at least one of the two victims presented in the practical case profited one duration of ITT higher than 8 days. this investigation confirms the major place of the general practitioner in the drafting of the initial medical certificate of aggravated assault and in the fixing of the total disablement of work to the penal direction of the term. However, there is a margin of important progress, in terms of professional practices, between the fixation of the duration of total working incapacity and the duration of medical certificate concerning the victim. The maintenance of a continuous medical training as regards medico-legal practice appears important for all the experts in ambulatory exercise of first recourse.
Wilcox, Rand R
2010-05-01
This paper considers the problem of estimating the overall strength of an association, including situations where there is curvature. The general strategy is to fit a robust regression line, or some type of smoother that allows curvature, and then use a robust analogue of explanatory power, say eta(2). When the regression surface is a plane, an estimate of eta(2) via the Theil-Sen estimator is found to perform well, relative to some other robust regression estimators, in terms of mean squared error and bias. When there is curvature, a generalization of a kernel estimator derived by Fan performs relatively well, but two alternative smoothers have certain practical advantages. When eta(2) is approximately equal to zero, estimation using smoothers has relatively high bias. A variation of eta(2) is suggested for dealing with this problem. Methods for testing H(0): eta(2)=0 are examined that are based in part on smoothers. Two methods are found that control Type I error probabilities reasonably well in simulations. Software for applying the more successful methods is provided.
Unification of regression-based methods for the analysis of natural selection.
Morrissey, Michael B; Sakrejda, Krzysztof
2013-07-01
Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares-based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.
Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P
2016-02-01
Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. Observational cohort study. Five hospitals, from November 2008 until January 2013. Hospitalized ward patients None Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]). In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.
Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P
2016-01-01
OBJECTIVE Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. DESIGN Observational cohort study. SETTING Five hospitals, from November 2008 until January 2013. PATIENTS Hospitalized ward patients INTERVENTIONS None MEASUREMENTS AND MAIN RESULTS Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score (MEWS), a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC 0.80 [95% CI 0.80–0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC 0.77 vs 0.74; p<0.01), and all models were more accurate than the MEWS (AUC 0.70 [95% CI 0.70–0.70]). CONCLUSIONS In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards. PMID:26771782
Impact of regression methods on improved effects of soil structure on soil water retention estimates
NASA Astrophysics Data System (ADS)
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
Comparison of regression methods for modeling intensive care length of stay.
Verburg, Ilona W M; de Keizer, Nicolette F; de Jonge, Evert; Peek, Niels
2014-01-01
Intensive care units (ICUs) are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS) could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE) in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2), root mean squared prediction error (RMSPE), mean absolute prediction error (MAPE) and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0) and a mean of 4.2 days (standard deviation 7.9). The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between -2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.
A subagging regression method for estimating the qualitative and quantitative state of groundwater
NASA Astrophysics Data System (ADS)
Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young
2017-08-01
A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.
Karim, Md Nazmul; Reid, Christopher M; Tran, Lavinia; Cochrane, Andrew; Billah, Baki
2017-05-01
To compare the impact of different variable selection methods in multiple regression to develop a parsimonious model for predicting postoperative outcomes of patients undergoing cardiac surgery. Data from 84,135 patients in the Australian and New Zealand Society of Cardiac and Thoracic Surgeons registry between 2001 and 2014 were analyzed. Primary outcome was 30-day-mortality. Mixed-effect logistic regressions were used to build the model. Missing values were imputed by the use of multiple imputations. The following 5 variable selection methods were compared: bootstrap receiver-operative characteristic (ROC), bootstrap Akaike information criteria, bootstrap Bayesian information criteria, and stepwise forward and stepwise backward methods. The final model's prediction performance was evaluated by the use of Frank Harrell's calibration curve and using a multifold cross-validation approach. Stepwise forward and backward methods selected same set of 21 variables into the model with the area under the ROC (AUC) of 0.8490. The bootstrap ROC method selected 13 variables with AUC of 0.8450. Bootstrap Bayesian information criteria and Akaike information criteria respectively selected 16 (AUC: 0.8470) and 23 (AUC: 0.8491) variables. Bootstrap ROC model was selected as the final model which showed very good discrimination and calibration power. Clinical suitability in terms of parsimony and prediction performance can be achieved substantially by using the bootstrap ROC method for the development of risk prediction models. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
An Efficient Simulation Budget Allocation Method Incorporating Regression for Partitioned Domains*
Brantley, Mark W.; Lee, Loo Hay; Chen, Chun-Hung; Xu, Jie
2014-01-01
Simulation can be a very powerful tool to help decision making in many applications but exploring multiple courses of actions can be time consuming. Numerous ranking & selection (R&S) procedures have been developed to enhance the simulation efficiency of finding the best design. To further improve efficiency, one approach is to incorporate information from across the domain into a regression equation. However, the use of a regression metamodel also inherits some typical assumptions from most regression approaches, such as the assumption of an underlying quadratic function and the simulation noise is homogeneous across the domain of interest. To extend the limitation while retaining the efficiency benefit, we propose to partition the domain of interest such that in each partition the mean of the underlying function is approximately quadratic. Our new method provides approximately optimal rules for between and within partitions that determine the number of samples allocated to each design location. The goal is to maximize the probability of correctly selecting the best design. Numerical experiments demonstrate that our new approach can dramatically enhance efficiency over existing efficient R&S methods. PMID:24936099
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Lin, Wan-Yu; Schaid, Daniel J
2009-04-01
Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
NASA Astrophysics Data System (ADS)
Abraham, Simon; Raisee, Mehrdad; Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
Penalized feature selection and classification in bioinformatics
Huang, Jian
2008-01-01
In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data. PMID:18562478
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil.
Detection of different outlier scenarios in circular regression model using single-linkage method
NASA Astrophysics Data System (ADS)
Di, N. F. M.; Satari, S. Z.; Zakaria, R.
2017-09-01
Outliers are the set of data that are significantly deviates or dissimilar from the rest of the data set. In circular regression model, the existence of outliers are well known to give a large effect on the parameter estimates and inferences. In this study, we proposed clustering-based method using single linkage to detect multiple outliers. Single-linkage is one of several clustering methods, where the distance between two clusters is determined by a single pair element that are closest to each other. We examined two outlier scenarios with a certain degree of contamination. The performance of proposed method on different outlier scenarios are compared and the best method for each outlier scenario is chosen.
Landslide susceptibility mapping on a global scale using the method of logistic regression
NASA Astrophysics Data System (ADS)
Lin, Le; Lin, Qigen; Wang, Ying
2017-08-01
This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC) curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC) was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.
Comparing the index-flood and multiple-regression methods using L-moments
NASA Astrophysics Data System (ADS)
Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.
In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin
Least squares regression methods for clustered ROC data with discrete covariates.
Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton
2016-07-01
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods.
A linear regression method for the study of the Coomassie brilliant blue protein assay.
Wei, Y J; Li, K A; Tong, S Y
1997-05-01
The interactions of Coomassie brilliant blue G-250 (CBB) with bovine serum albumin (BSA) and gamma-globulin at low pH are investigated by a spectrophotometric method. It is considered that the binding of CBB to protein is because of the weak interactions (ionic, van der Waals, hydrogen bonding, and hydrophobic). The solution equilibria involving the binding of three dye species (blue, green, and red) to protein are treated in the same way as Ringbom model used in the treatment of complexation in analytical chemistry. Based on this treatment, the formation of an isosbestic point in the absorption spectra of CBB-BSA mixtures is discussed, two mathematical models for the description of the CBB protein assay are developed. The first model is a nonlinear equation which is rigorous in theory but unreliable in use because of its optimization procedure. The second model based on an approximation is a linear equation, it allows to estimate apparent binding constant, maximum binding number, and molar absorptivity of bound dye from assay data by a linear regression method. The results of the linear regression operations are reasonable and in agreement with experimental findings. Factors which influence the sensitivity of the CBB protein assay are studied using this method. Ionic strength and acidity are found to have significant effect on the binding of CBB to protein.
Structural break detection method based on the Adaptive Regression Splines technique
NASA Astrophysics Data System (ADS)
Kucharczyk, Daniel; Wyłomańska, Agnieszka; Zimroz, Radosław
2017-04-01
For many real data, long term observation consists of different processes that coexist or occur one after the other. Those processes very often exhibit different statistical properties and thus before the further analysis the observed data should be segmented. This problem one can find in different applications and therefore new segmentation techniques have been appeared in the literature during last years. In this paper we propose a new method of time series segmentation, i.e. extraction from the analysed vector of observations homogeneous parts with similar behaviour. This method is based on the absolute deviation about the median of the signal and is an extension of the previously proposed techniques also based on the simple statistics. In this paper we introduce the method of structural break point detection which is based on the Adaptive Regression Splines technique, one of the form of regression analysis. Moreover we propose also the statistical test which allows testing hypothesis of behaviour related to different regimes. First, the methodology we apply to the simulated signals with different distributions in order to show the effectiveness of the new technique. Next, in the application part we analyse the real data set that represents the vibration signal from a heavy duty crusher used in a mineral processing plant.
Flexible regression models over river networks
O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark
2014-01-01
Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460
A semiparametric likelihood-based method for regression analysis of mixed panel-count data.
Zhu, Liang; Zhang, Ying; Li, Yimei; Sun, Jianguo; Robison, Leslie L
2017-09-15
Panel-count data arise when each study subject is observed only at discrete time points in a recurrent event study, and only the numbers of the event of interest between observation time points are recorded (Sun and Zhao, 2013). However, sometimes the exact number of events between some observation times is unknown and what we know is only whether the event of interest has occurred. In this article, we will refer this type of data to as mixed panel-count data and propose a likelihood-based semiparametric regression method for their analysis by using the nonhomogeneous Poisson process assumption. However, we establish the asymptotic properties of the resulting estimator by employing the empirical process theory and without using the Poisson assumption. Also, we conduct an extensive simulation study, which suggests that the proposed method works well in practice. Finally, the method is applied to a Childhood Cancer Survivor Study that motivated this study. © 2017, The International Biometric Society.
Kang, Kookjin; Roh, Yongrae
2003-09-01
The performance of an acoustic transducer is determined by the effects of many structural variables, and in most cases the influences of these variables are not linearly independent of each other. To achieve optimal performance of an acoustic transducer, we must consider the cross-coupled effects of its structural variables. In this study, with the finite-element method, the variation of the operation frequency and sound pressure of a flextensional transducer in relation to its structural variables is analyzed. Through statistical multiple regression analysis of the results, functional forms of the operation frequency and sound pressure of the transducer in terms of the structural variables were derived, with which the optimal structure of the transducer was determined by means of a constrained optimization technique, the sequential quadratic programming method of Phenichny and Danilin. The proposed method can reflect all the cross-coupled effects of multiple structural variables, and can be extended to the design of general acoustic transducers.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples.
NASA Astrophysics Data System (ADS)
Melo, Raquel; Vieira, Gonçalo; Caselli, Alberto; Ramos, Miguel
2010-05-01
Field surveying during the austral summer of 2007/08 and the analysis of a QuickBird satellite image, resulted on the production of a detailed geomorphological map of the Irizar and Crater Lake area in Deception Island (South Shetlands, Maritime Antarctic - 1:10 000) and allowed its analysis and spatial modelling of the geomorphological phenomena. The present study focus on the analysis of the spatial distribution and characteristics of hummocky terrains, lag surfaces and nivation hollows, complemented by GIS spatial modelling intending to identify relevant controlling geographical factors. Models of the susceptibility of occurrence of these phenomena were created using two statistical methods: logistical regression, as a multivariate method; and the informative value as a bivariate method. Success and prediction rate curves were used for model validation. The Area Under the Curve (AUC) was used to quantify the level of performance and prediction of the models and to allow the comparison between the two methods. Regarding the logistic regression method, the AUC showed a success rate of 71% for the lag surfaces, 81% for the hummocky terrains and 78% for the nivation hollows. The prediction rate was 72%, 68% and 71%, respectively. Concerning the informative value method, the success rate was 69% for the lag surfaces, 84% for the hummocky terrains and 78% for the nivation hollows, and with a correspondingly prediction of 71%, 66% and 69%. The results were of very good quality and demonstrate the potential of the models to predict the influence of independent variables in the occurrence of the geomorphological phenomena and also the reliability of the data. Key-words: present-day geomorphological dynamics, detailed geomorphological mapping, GIS, spatial modelling, Deception Island, Antarctic.
27 CFR 24.148 - Penal sums of bonds.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Penal sums of bonds. 24.148 Section 24.148 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU... Penal sums of bonds. The penal sums of bonds prescribed in this part are as follows: Bond Basis Penal...
ℓ(1)-penalized linear mixed-effects models for high dimensional data with application to BCI.
Fazli, Siamac; Danóczy, Márton; Schelldorfer, Jürg; Müller, Klaus-Robert
2011-06-15
Recently, a novel statistical model has been proposed to estimate population effects and individual variability between subgroups simultaneously, by extending Lasso methods. We will for the first time apply this so-called ℓ(1)-penalized linear regression mixed-effects model for a large scale real world problem: we study a large set of brain computer interface data and through the novel estimator are able to obtain a subject-independent classifier that compares favorably with prior zero-training algorithms. This unifying model inherently compensates shifts in the input space attributed to the individuality of a subject. In particular we are now for the first time able to differentiate within-subject and between-subject variability. Thus a deeper understanding both of the underlying statistical and physiological structures of the data is gained.
Shi, Yinghuan; Gao, Yaozong; Liao, Shu; Zhang, Daoqiang
2015-01-01
In1 recent years, there has been a great interest in prostate segmentation, which is a important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician’s simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms: tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice. PMID:26752809
NASA Astrophysics Data System (ADS)
Yun, Yuqi; Zevin, Michael; Sampson, Laura; Kalogera, Vassiliki
2017-01-01
With more observations from LIGO in the upcoming years, we will be able to construct an observed mass distribution of black holes to compare with binary evolution simulations. This will allow us to investigate the physics of binary evolution such as the effects of common envelope efficiency and wind strength, or the properties of the population such as the initial mass function.However, binary evolution codes become computationally expensive when running large populations of binaries over a multi-dimensional grid of input parameters, and may simulate accurately only for a limited combination of input parameter values. Therefore we developed a fast machine-learning method that utilizes Gaussian Mixture Model (GMM) and Gaussian Process (GP) regression, which together can predict distributions over the entire parameter space based on a limited number of simulated models. Furthermore, Gaussian Process regression naturally provides interpolation errors in addition to interpolation means, which could provide a means of targeting the most uncertain regions of parameter space for running further simulations.We also present a case study on applying this new method to predicting chirp mass distributions for binary black hole systems (BBHs) in Milky-way like galaxies of different metallicities.
Kew, William; Mitchell, John B O
2015-09-01
The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. This investigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the 'wisdom of crowds' principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data preprocessing methodology was found to be crucial to performance of each method too. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Li, Min; Zhou, Tong; Song, Yanan
2016-07-01
A grain size characterization method based on energy attenuation coefficient spectrum and support vector regression (SVR) is proposed. First, the spectra of the first and second back-wall echoes are cut into several frequency bands to calculate the energy attenuation coefficient spectrum. Second, the frequency band that is sensitive to grain size variation is determined. Finally, a statistical model between the energy attenuation coefficient in the sensitive frequency band and average grain size is established through SVR. Experimental verification is conducted on austenitic stainless steel. The average relative error of the predicted grain size is 5.65%, which is better than that of conventional methods. Copyright © 2016 Elsevier B.V. All rights reserved.
Feng, Zeny Z; Yang, Xiaojian; Subedi, Sanjeena; McNicholas, Paul D
2012-01-01
Recent work concerning quantitative traits of interest has focused on selecting a small subset of single nucleotide polymorphisms (SNPs) from amongst the SNPs responsible for the phenotypic variation of the trait. When considered as covariates, the large number of variables (SNPs) and their association with those in close proximity pose challenges for variable selection. The features of sparsity and shrinkage of regression coefficients of the least absolute shrinkage and selection operator (LASSO) method appear attractive for SNP selection. Sparse partial least squares (SPLS) is also appealing as it combines the features of sparsity in subset selection and dimension reduction to handle correlations amongst SNPs. In this paper we investigate application of the LASSO and SPLS methods for selecting SNPs that predict quantitative traits. We evaluate the performance of both methods with different criteria and under different scenarios using simulation studies. Results indicate that these methods can be effective in selecting SNPs that predict quantitative traits but are limited by some conditions. Both methods perform similarly overall but each exhibit advantages over the other in given situations. Both methods are applied to Canadian Holstein cattle data to compare their performance.
NASA Astrophysics Data System (ADS)
Wu, Peilin; Zhang, Qunying; Fei, Chunjiao; Fang, Guangyou
2017-04-01
Aeromagnetic gradients are typically measured by optically pumped magnetometers mounted on an aircraft. Any aircraft, particularly helicopters, produces significant levels of magnetic interference. Therefore, aeromagnetic compensation is essential, and least square (LS) is the conventional method used for reducing interference levels. However, the LSs approach to solving the aeromagnetic interference model has a few difficulties, one of which is in handling multicollinearity. Therefore, we propose an aeromagnetic gradient compensation method, specifically targeted for helicopter use but applicable on any airborne platform, which is based on the ɛ-support vector regression algorithm. The structural risk minimization criterion intrinsic to the method avoids multicollinearity altogether. Local aeromagnetic anomalies can be retained, and platform-generated fields are suppressed simultaneously by constructing an appropriate loss function and kernel function. The method was tested using an unmanned helicopter and obtained improvement ratios of 12.7 and 3.5 in the vertical and horizontal gradient data, respectively. Both of these values are probably better than those that would have been obtained from the conventional method applied to the same data, had it been possible to do so in a suitable comparative context. The validity of the proposed method is demonstrated by the experimental result.
Adaptive wavelet simulation of global ocean dynamics using a new Brinkman volume penalization
NASA Astrophysics Data System (ADS)
Kevlahan, N. K.-R.; Dubos, T.; Aechtner, M.
2015-12-01
In order to easily enforce solid-wall boundary conditions in the presence of complex coastlines, we propose a new mass and energy conserving Brinkman penalization for the rotating shallow water equations. This penalization does not lead to higher wave speeds in the solid region. The error estimates for the penalization are derived analytically and verified numerically for linearized one-dimensional equations. The penalization is implemented in a conservative dynamically adaptive wavelet method for the rotating shallow water equations on the sphere with bathymetry and coastline data from NOAA's ETOPO1 database. This code could form the dynamical core for a future global ocean model. The potential of the dynamically adaptive ocean model is illustrated by using it to simulate the 2004 Indonesian tsunami and wind-driven gyres.
NASA Astrophysics Data System (ADS)
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2017-07-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
NASA Astrophysics Data System (ADS)
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Passaro, Antony D; Vettel, Jean M; McDaniel, Jonathan; Lawhern, Vernon; Franaszczuk, Piotr J; Gordon, Stephen M
2017-03-01
During an experimental session, behavioral performance fluctuates, yet most neuroimaging analyses of functional connectivity derive a single connectivity pattern. These conventional connectivity approaches assume that since the underlying behavior of the task remains constant, the connectivity pattern is also constant. We introduce a novel method, behavior-regressed connectivity (BRC), to directly examine behavioral fluctuations within an experimental session and capture their relationship to changes in functional connectivity. This method employs the weighted phase lag index (WPLI) applied to a window of trials with a weighting function. Using two datasets, the BRC results are compared to conventional connectivity results during two time windows: the one second before stimulus onset to identify predictive relationships, and the one second after onset to capture task-dependent relationships. In both tasks, we replicate the expected results for the conventional connectivity analysis, and extend our understanding of the brain-behavior relationship using the BRC analysis, demonstrating subject-specific BRC maps that correspond to both positive and negative relationships with behavior. Comparison with Existing Method(s): Conventional connectivity analyses assume a consistent relationship between behaviors and functional connectivity, but the BRC method examines performance variability within an experimental session to understand dynamic connectivity and transient behavior. The BRC approach examines connectivity as it covaries with behavior to complement the knowledge of underlying neural activity derived from conventional connectivity analyses. Within this framework, BRC may be implemented for the purpose of understanding performance variability both within and between participants. Published by Elsevier B.V.
Alternating iterative regression method for dead time estimation from experimental designs.
Pous-Torres, S; Torres-Lapasió, J R; Baeza-Baeza, J J; García-Alvarez-Coque, M C
2009-05-01
An indirect method for dead time (t (0)) estimation in reversed-phase liquid chromatography, based on a relationship between retention time and organic solvent content, is proposed. The method processes the retention data obtained in experimental designs. In order to get more general validity and enhance the accuracy, the information from several compounds is used altogether in an alternating regression fashion. The method was applied to nitrosamines, alkylbenzenes, phenols, benzene derivatives, polycyclic aromatic hydrocarbons and beta-blockers, among other compounds, chromatographed in a cyano and several C18 columns. A comprehensive validation was carried out by comparing the results with those provided by the injection of markers, the observation of the solvent front and the homologous series method. It was also found that different groups of compounds yielded the same t (0) value with the same column, which was verified in different solvent composition windows. The method allows improved models useful for optimisation or for other purposes, since t (0) can be estimated with the retention data of the target solutes.
Race Making in a Penal Institution.
Walker, Michael L
2016-01-01
This article provides a ground-level investigation into the lives of penal inmates, linking the literature on race making and penal management to provide an understanding of racial formation processes in a modern penal institution. Drawing on 135 days of ethnographic data collected as an inmate in a Southern California county jail system, the author argues that inmates are subjected to two mutually constitutive racial projects--one institutional and the other microinteractional. Operating in symbiosis within a narrative of risk management, these racial projects increase (rather than decrease) incidents of intraracial violence and the potential for interracial violence. These findings have implications for understanding the process of racialization and evaluating the effectiveness of penal management strategies.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2016-02-03
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
Likelihood methods for regression models with expensive variables missing by design.
Zhao, Yang; Lawless, Jerald F; McLeish, Donald L
2009-02-01
In some applications involving regression the values of certain variables are missing by design for some individuals. For example, in two-stage studies (Zhao and Lipsitz, 1992), data on "cheaper" variables are collected on a random sample of individuals in stage I, and then "expensive" variables are measured for a subsample of these in stage II. So the "expensive" variables are missing by design at stage I. Both estimating function and likelihood methods have been proposed for cases where either covariates or responses are missing. We extend the semiparametric maximum likelihood (SPML) method for missing covariate problems (e.g. Chen, 2004; Ibrahim et al., 2005; Zhang and Rockette, 2005, 2007) to deal with more general cases where covariates and/or responses are missing by design, and show that profile likelihood ratio tests and interval estimation are easily implemented. Simulation studies are provided to examine the performance of the likelihood methods and to compare their efficiencies with estimating function methods for problems involving (a) a missing covariate and (b) a missing response variable. We illustrate the ease of implementation of SPML and demonstrate its high efficiency.
Performance of robust regression methods in real-time polymerase chain reaction calibration.
Orenti, Annalisa; Marubini, Ettore
2014-12-09
The ordinary least squares (OLS) method is routinely used to estimate the unknown concentration of nucleic acids in a given solution by means of calibration. However, when outliers are present it could appear sensible to resort to robust regression methods. We analyzed data from an External Quality Control program concerning quantitative real-time PCR and we found that 24 laboratories out of 40 presented outliers, which occurred most frequently at the lowest concentrations. In this article we investigated and compared the performance of the OLS method, the least absolute deviation (LAD) method, and the biweight MM-estimator in real-time PCR calibration via a Monte Carlo simulation. Outliers were introduced by replacement contamination. When contamination was absent the coverages of OLS and MM-estimator intervals were acceptable and their widths small, whereas LAD intervals had acceptable coverages at the expense of higher widths. In the presence of contamination we observed a trade-off between width and coverage: the OLS performance got worse, the MM-estimator intervals widths remained short (but this was associated with a reduction in coverages), while LAD intervals widths were constantly larger with acceptable coverages at the nominal level.
Wang, Molin; Kuchiba, Aya; Ogino, Shuji
2015-01-01
In interdisciplinary biomedical, epidemiologic, and population research, it is increasingly necessary to consider pathogenesis and inherent heterogeneity of any given health condition and outcome. As the unique disease principle implies, no single biomarker can perfectly define disease subtypes. The complex nature of molecular pathology and biology necessitates biostatistical methodologies to simultaneously analyze multiple biomarkers and subtypes. To analyze and test for heterogeneity hypotheses across subtypes defined by multiple categorical and/or ordinal markers, we developed a meta-regression method that can utilize existing statistical software for mixed-model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by 1 marker while controlling for other markers and to evaluate whether the difference in exposure-subtype association across subtypes defined by 1 marker depends on any other markers. To illustrate this method in molecular pathological epidemiology research, we examined the associations between smoking status and colorectal cancer subtypes defined by 3 correlated tumor molecular characteristics (CpG island methylator phenotype, microsatellite instability, and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation) in the Nurses' Health Study (1980–2010) and the Health Professionals Follow-up Study (1986–2010). This method can be widely useful as molecular diagnostics and genomic technologies become routine in clinical medicine and public health. PMID:26116215
NASA Astrophysics Data System (ADS)
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Isa, Zakiah Mohd; Tawfiq, Omar Farouq; Noor, Norliza Mohd; Shamsudheen, Mohd Iqbal; Rijal, Omar Mohd
2010-03-01
In rehabilitating edentulous patients, selecting appropriately sized teeth in the absence of preextraction records is problematic. The purpose of this study was to investigate the relationships between some facial dimensions and widths of the maxillary anterior teeth to potentially provide a guide for tooth selection. Sixty full dentate Malaysian adults (18-36 years) representing 2 ethnic groups (Malay and Chinese), with well aligned maxillary anterior teeth and minimal attrition, participated in this study. Standardized digital images of the face, viewed frontally, were recorded. Using image analyzing software, the images were used to determine the interpupillary distance (IPD), inner canthal distance (ICD), and interalar width (IA). Widths of the 6 maxillary anterior teeth were measured directly from casts of the subjects using digital calipers. Regression analyses were conducted to measure the strength of the associations between the variables (alpha=.10). The means (standard deviations) of IPD, IA, and ICD of the subjects were 62.28 (2.47), 39.36 (3.12), and 34.36 (2.15) mm, respectively. The mesiodistal diameters of the maxillary central incisors, lateral incisors, and canines were 8.54 (0.50), 7.09 (0.48), and 7.94 (0.40) mm, respectively. The width of the central incisors was highly correlated to the IPD (r=0.99), while the widths of the lateral incisors and canines were highly correlated to a combination of IPD and IA (r=0.99 and 0.94, respectively). Using regression methods, the widths of the anterior teeth within the population tested may be predicted by a combination of the facial dimensions studied. (c) 2010 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Eng, K.; Milly, P.C.D.; Tasker, Gary D.
2007-01-01
To facilitate estimation of streamflow characteristics at an ungauged site, hydrologists often define a region of influence containing gauged sites hydrologically similar to the estimation site. This region can be defined either in geographic space or in the space of the variables that are used to predict streamflow (predictor variables). These approaches are complementary, and a combination of the two may be superior to either. Here we propose a hybrid region-of-influence (HRoI) regression method that combines the two approaches. The new method was applied with streamflow records from 1,091 gauges in the southeastern United States to estimate the 50-year peak flow (Q50). The HRoI approach yielded lower root-mean-square estimation errors and produced fewer extreme errors than either the predictor-variable or geographic region-of-influence approaches. It is concluded, for Q50 in the study region, that similarity with respect to the basin characteristics considered (area, slope, and annual precipitation) is important, but incomplete, and that the consideration of geographic proximity of stations provides a useful surrogate for characteristics that are not included in the analysis. ?? 2007 ASCE.
A faster optimization method based on support vector regression for aerodynamic problems
NASA Astrophysics Data System (ADS)
Yang, Xixiang; Zhang, Weihua
2013-09-01
In this paper, a new strategy for optimal design of complex aerodynamic configuration with a reasonable low computational effort is proposed. In order to solve the formulated aerodynamic optimization problem with heavy computation complexity, two steps are taken: (1) a sequential approximation method based on support vector regression (SVR) and hybrid cross validation strategy, is proposed to predict aerodynamic coefficients, and thus approximates the objective function and constraint conditions of the originally formulated optimization problem with given limited sample points; (2) a sequential optimization algorithm is proposed to ensure the obtained optimal solution by solving the approximation optimization problem in step (1) is very close to the optimal solution of the originally formulated optimization problem. In the end, we adopt a complex aerodynamic design problem, that is optimal aerodynamic design of a flight vehicle with grid fins, to demonstrate our proposed optimization methods, and numerical results show that better results can be obtained with a significantly lower computational effort than using classical optimization techniques.
Kim, Soeun; Sugar, Catherine A.; Belin, Thomas R.
2015-01-01
Imputation strategies are widely used in settings that involve inference with incomplete data. However, implementation of a particular approach always rests on assumptions, and subtle distinctions between methods can have an impact on subsequent analyses. In this paper we are concerned with regression models in which the true underlying relationship includes interaction terms. We focus in particular on a linear model with one fully observed continuous predictor, a second partially observed continuous predictor, and their interaction. We derive the conditional distribution of the missing covariate and interaction term given the observed covariate and the outcome variable, and examine the performance of a multiple imputation procedure based on this distribution. We also investigate several alternative procedures that can be implemented by adapting multivariate normal multiple imputation software in ways that might be expected to perform well despite incompatibilities between model assumptions and true underlying relationships among the variables. The methods are compared in terms of bias, coverage and confidence interval width. As expected, the procedure based on the correct conditional distribution (CCD) performs well across all scenarios. Just as importantly for general practitioners, several of the approaches based on multivariate normality perform comparably to the CCD in a number of circumstances, although, interestingly, procedures that seek to preserve the multiplicative relationship between the interaction term and the main-effects are found to be substantially less reliable. For illustration, the various procedures are applied to an analysis of post-traumatic-stress-disorder symptoms in a study of childhood trauma. PMID:25630757
Standard regression-based methods for measuring recovery after sport-related concussion.
McCrea, Michael; Barr, William B; Guskiewicz, Kevin; Randolph, Christopher; Marshall, Stephen W; Cantu, Robert; Onate, James A; Kelly, James P
2005-01-01
Clinical decision making about an athlete's return to competition after concussion is hampered by a lack of systematic methods to measure recovery. We applied standard regression-based methods to statistically measure individual rates of impairment at several time points after concussion in college football players. Postconcussive symptoms, cognitive functioning, and balance were assessed in 94 players with concussion (based on American Academy of Neurology Criteria) and 56 noninjured controls during preseason baseline testing, and immediately, 3 hr, and 1, 2, 3, 5, and 7 days postinjury. Ninety-five percent of injured players exhibited acute concussion symptoms and impairment on cognitive or balance testing immediately after injury, which diminished to 4% who reported elevated symptoms on postinjury day 7. In addition, a small but clinically significant percentage of players who reported being symptom free by day 2 continued to be classified as impaired on the basis of objective balance and cognitive testing. These data suggest that neuropsychological testing may be of incremental utility to subjective symptom checklists in identifying the residual effects of sport-related concussion. The implementation of neuropsychological testing to detect subtle cognitive impairment is most useful once postconcussive symptoms have resolved. This management model is also supported by practical and other methodological considerations.
NASA Astrophysics Data System (ADS)
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ≈90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
NASA Astrophysics Data System (ADS)
Spietz, Henrik Juul; Hejlesen, Mads Mølholm; Walther, Jens Honoré
2017-05-01
We present a Brinkman penalization method for three-dimensional (3D) flows using particle vortex methods, improving the existing technique by means of an iterative process. We perform simulations to study the impulsively started flow past a sphere at Re = 1000 and normal to a circular disc at Re = 500. The simulation results obtained for the flow past a sphere are found in qualitative good agreement with previously published results obtained using respectively a 3D vortex penalization method and a 3D vortex method combined with an accurate boundary element method. From the results obtained for the flow normal to a circular disc it is found that the iterative method enables the use of a time step that is one order of magnitude larger than required by the standard non-iterative Brinkman penalization method.
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Investigating the Accuracy of Three Estimation Methods for Regression Discontinuity Design
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2013-01-01
Regression discontinuity design is an alternative to randomized experiments to make causal inference when random assignment is not possible. This article first presents the formal identification and estimation of regression discontinuity treatment effects in the framework of Rubin's causal model, followed by a thorough literature review of…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
NASA Astrophysics Data System (ADS)
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Model building in nonproportional hazard regression.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
2013-12-30
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses.
NASA Astrophysics Data System (ADS)
Reddy, K. S.; Somasundharam, S.
2016-09-01
In this work, inverse heat conduction problem (IHCP) involving the simultaneous estimation of principal thermal conductivities (kxx,kyy,kzz ) and specific heat capacity of orthotropic materials is solved by using surrogate forward model. Uniformly distributed random samples for each unknown parameter is generated from the prior knowledge about these parameters and Finite Volume Method (FVM) is employed to solve the forward problem for temperature distribution with space and time. A supervised machine learning technique- Gaussian Process Regression (GPR) is used to construct the surrogate forward model with the available temperature solution and randomly generated unknown parameter data. The statistical and machine learning toolbox available in MATLAB R2015b is used for this purpose. The robustness of the surrogate model constructed using GPR is examined by carrying out the parameter estimation for 100 new randomly generated test samples at a measurement error of ±0.3K. The temperature measurement is obtained by adding random noise with the mean at zero and known standard deviation (σ = 0.1) to the FVM solution of the forward problem. The test results show that Mean Percentage Deviation (MPD) of all test samples for all parameters is < 10%.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil
2015-08-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Law, G.S.; Tasker, Gary D.; Bizier, P.; DeBarry, P.
2003-01-01
The region-of-influence method and regional-regression equations are used to predict flood frequency of unregulated and ungaged rivers and streams of Tennessee. The prediction methods have been developed using strem-gage records from unregulated streams draining basins having 1-30% total impervious area. A computer application automates the calculation of the flood frequencies of the unregulated streams. Average deleted-residual prediction errors for the region-of-influence method are found to be slightly smaller than those for the regional regression methods.
Dasgupta, Abhijit; Sun, Yan V.; König, Inke R.; Bailey-Wilson, Joan E.; Malley, James D.
2012-01-01
Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies. Unsupervised methods, such as cluster analysis, were used for data segmentation and subset selection. Supervised learning methods, which include regression-based methods (e.g., generalized linear models, logic regression, and regularized regression) and tree-based methods (e.g., decision trees and random forests), were used for variable selection (selecting genetic and clinical features most associated or predictive of outcome) and prediction (developing models using common and rare genetic variants to accurately predict outcome), with the outcome being case-control status or quantitative trait value. We include a discussion of cross-validation for model selection and assessment and a description of available software resources for these methods. PMID:22128059
Domain selection for the varying coefficient model via local polynomial regression
Kong, Dehan; Bondell, Howard; Wu, Yichao
2014-01-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example. PMID:25506112
Domain selection for the varying coefficient model via local polynomial regression.
Kong, Dehan; Bondell, Howard; Wu, Yichao
2015-03-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example.
A primer on regression methods for decoding cis-regulatory logic
Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.
2009-03-03
The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain
Revisiting the Distance Duality Relation using a non-parametric regression method
NASA Astrophysics Data System (ADS)
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction
Mercedes Berterretche; Andrew T. Hudak; Warren B. Cohen; Thomas K. Maiersperger; Stith T. Gower; Jennifer Dungan
2005-01-01
This study compared aspatial and spatial methods of using remote sensing and field data to predict maximum growing season leaf area index (LAI) maps in a boreal forest in Manitoba, Canada. The methods tested were orthogonal regression analysis (reduced major axis, RMA) and two geostatistical techniques: kriging with an external drift (KED) and sequential Gaussian...
Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method
ERIC Educational Resources Information Center
Rutkowski, Leslie; Zhou, Yan
2015-01-01
Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…
Regression methods for spatially correlated data: an example using beetle attacks in a seed orchard
Preisler Haiganoush; Nancy G. Rappaport; David L. Wood
1997-01-01
We present a statistical procedure for studying the simultaneous effects of observed covariates and unmeasured spatial variables on responses of interest. The procedure uses regression type analyses that can be used with existing statistical software packages. An example using the rate of twig beetle attacks on Douglas-fir trees in a seed orchard illustrates the...
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
ERIC Educational Resources Information Center
Ferrer, Alvaro J. Arce; Wang, Lin
This study compared the classification performance among parametric discriminant analysis, nonparametric discriminant analysis, and logistic regression in a two-group classification application. Field data from an organizational survey were analyzed and bootstrapped for additional exploration. The data were observed to depart from multivariate…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
Wallstrom, Garrick L; Kass, Robert E; Miller, Anita; Cohn, Jeffrey F; Fox, Nathan A
2004-07-01
A variety of procedures have been proposed to correct ocular artifacts in the electroencephalogram (EEG), including methods based on regression, principal components analysis (PCA) and independent component analysis (ICA). The current study compared these three methods, and it evaluated a modified regression approach using Bayesian adaptive regression splines to filter the electrooculogram (EOG) before computing correction factors. We applied each artifact correction procedure to real and simulated EEG data of varying epoch lengths and then quantified the impact of correction on spectral parameters of the EEG. We found that the adaptive filter improved regression-based artifact correction. An automated PCA method effectively reduced ocular artifacts and resulted in minimal spectral distortion, whereas ICA correction appeared to distort power between 5 and 20 Hz. In general, reducing the epoch length improved the accuracy of estimating spectral power in the alpha (7.5-12.5 Hz) and beta (12.5-19.5 Hz) bands, but it worsened the accuracy for power in the theta (3.5-7.5 Hz) band and distorted time domain features. Results supported the use of regression-based and PCA-based ocular artifact correction and suggested a need for further studies examining possible spectral distortion from ICA-based correction procedures.
NCAA Penalizes Fewer Teams than Expected
ERIC Educational Resources Information Center
Sander, Libby
2008-01-01
This article reports that the National Collegiate Athletic Association (NCAA) has penalized fewer teams than it expected this year over athletes' poor academic performance. For years, officials with the NCAA have predicted that strikingly high numbers of college sports teams could be at risk of losing scholarships this year because of their…
Environmental Conditions in Kentucky's Penal Institutions
ERIC Educational Resources Information Center
Bell, Irving
1974-01-01
A state task force was organized to identify health or environmental deficiencies existing in Kentucky penal institutions. Based on information gained through direct observation and inmate questionnaires, the task force concluded that many hazardous and unsanitary conditions existed, and recommended that immediate action be given to these…
NCAA Penalizes Fewer Teams than Expected
ERIC Educational Resources Information Center
Sander, Libby
2008-01-01
This article reports that the National Collegiate Athletic Association (NCAA) has penalized fewer teams than it expected this year over athletes' poor academic performance. For years, officials with the NCAA have predicted that strikingly high numbers of college sports teams could be at risk of losing scholarships this year because of their…
Environmental Conditions in Kentucky's Penal Institutions
ERIC Educational Resources Information Center
Bell, Irving
1974-01-01
A state task force was organized to identify health or environmental deficiencies existing in Kentucky penal institutions. Based on information gained through direct observation and inmate questionnaires, the task force concluded that many hazardous and unsanitary conditions existed, and recommended that immediate action be given to these…
NASA Astrophysics Data System (ADS)
Ji, Yanju; Huang, Wanyu; Yu, Mingmei; Guan, Shanshan; Wang, Yuan; Zhu, Yu
2017-01-01
This article studies full-waveform associated identification method of airborne time-domain electromagnetic method (ATEM) 3-d anomalies based on multiple linear regression analysis method. By using convolution algorithm, full-waveform theoretical responses are computed to derive sample library including switch-off-time period responses and off-time period responses. Extract full-waveform attributes from theoretical responses to derive linear regression equations which are used to identify the geological parameters. In order to improve the precision ulteriorly, we optimize the identification method by separating the sample library into different groups and identify the parameter respectively. Performance of full-waveform associated identification method with field data of wire-loop test experiments with ATEM system in Daedao of Changchun proves that the full-waveform associated identification method is feasible practically.
Benedetti, Andrea; Platt, Robert; Atherton, Juli
2014-01-01
Background Over time, adaptive Gaussian Hermite quadrature (QUAD) has become the preferred method for estimating generalized linear mixed models with binary outcomes. However, penalized quasi-likelihood (PQL) is still used frequently. In this work, we systematically evaluated whether matching results from PQL and QUAD indicate less bias in estimated regression coefficients and variance parameters via simulation. Methods We performed a simulation study in which we varied the size of the data set, probability of the outcome, variance of the random effect, number of clusters and number of subjects per cluster, etc. We estimated bias in the regression coefficients, odds ratios and variance parameters as estimated via PQL and QUAD. We ascertained if similarity of estimated regression coefficients, odds ratios and variance parameters predicted less bias. Results Overall, we found that the absolute percent bias of the odds ratio estimated via PQL or QUAD increased as the PQL- and QUAD-estimated odds ratios became more discrepant, though results varied markedly depending on the characteristics of the dataset Conclusions Given how markedly results varied depending on data set characteristics, specifying a rule above which indicated biased results proved impossible. This work suggests that comparing results from generalized linear mixed models estimated via PQL and QUAD is a worthwhile exercise for regression coefficients and variance components obtained via QUAD, in situations where PQL is known to give reasonable results. PMID:24416249
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System
Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio
2009-01-01
We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980
Comparison of regression and time-series methods for synthesizing missing streamflow records
Beauchamp, J.J.; Downing, D.J.; Railsback, S.F. )
1989-10-01
Regression and time-series techniques have been used to synthesize and predict the stream flow at the Foresta Bridge gage from information at the upstream Pohono Bridge gage on the Merced River near Yosemite National Park. Using the available data from two time periods (calendar year 1979 and water year 1986), the authors evaluated the two techniques in their ability to model the variation in the observed flows and in their ability to predict stream flow at the Foresta Bridge gage for the 1979 time period with data from the 1986 time period. Both techniques produced reasonably good estimates and forecasts of the flow at the downstream gage. However, the regression model was found to have a significant amount of autocorrelation in the residuals, which the time-series model was able to eliminate. The time-series technique presented can be of great assistance in arriving at reasonable estimates of flow in data sets that have large missing portions of data.
Penalized Spline: a General Robust Trajectory Model for ZIYUAN-3 Satellite
NASA Astrophysics Data System (ADS)
Pan, H.; Zou, Z.
2016-06-01
Owing to the dynamic imaging system, the trajectory model plays a very important role in the geometric processing of high resolution satellite imagery. However, establishing a trajectory model is difficult when only discrete and noisy data are available. In this manuscript, we proposed a general robust trajectory model, the penalized spline model, which could fit trajectory data well and smooth noise. The penalized parameter λ controlling the smooth and fitting accuracy could be estimated by generalized cross-validation. Five other trajectory models, including third-order polynomials, Chebyshev polynomials, linear interpolation, Lagrange interpolation and cubic spline, are compared with the penalized spline model. Both the sophisticated ephemeris and on-board ephemeris are used to compare the orbit models. The penalized spline model could smooth part of noise, and accuracy would decrease as the orbit length increases. The band-to-band misregistration of ZiYuan-3 Dengfeng and Faizabad multispectral images is used to evaluate the proposed method. With the Dengfeng dataset, the third-order polynomials and Chebyshev approximation could not model the oscillation, and introduce misregistration of 0.57 pixels misregistration in across-track direction and 0.33 pixels in along-track direction. With the Faizabad dataset, the linear interpolation, Lagrange interpolation and cubic spline model suffer from noise, introducing larger misregistration than the approximation models. Experimental results suggest the penalized spline model could model the oscillation and smooth noise.
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models.
Elliott, Michael R
2009-03-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create "data driven" weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.
USDA-ARS?s Scientific Manuscript database
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...
Ragab, Marwa A A; Youssef, Rasha M
2013-11-01
New hybrid chemometric method has been applied to the emission response data. It deals with convolution of emission data using 8-points sin xi polynomials (discrete Fourier functions) after the derivative treatment of these emission data. This new application was used for the simultaneous determination of Fexofenadine and Montelukast in bulk and pharmaceutical preparation. It was found beneficial in the resolution of partially overlapping emission spectra of this mixture. The application of this chemometric method was found beneficial in eliminating different types of interferences common in spectrofluorimetry such as overlapping emission spectra and self- quenching. Not only this chemometric approache was applied to the emission data but also the obtained data were subjected to non-parametric linear regression analysis (Theil's method). The presented work compares the application of Theil's method in handling the response data, with the least-squares parametric regression method, which is considered the de facto standard method used for regression. So this work combines the advantages of derivative and convolution using discrete Fourier function together with the reliability and efficacy of the non-parametric analysis of data. Theil's method was found to be superior to the method of least squares as it could effectively circumvent any outlier data points.
Sotgia, Salvatore; Mangoni, Arduino A; Pintus, Gianfranco; Carru, Ciriaco; Zinellu, Angelo
2017-08-01
To improve the effectiveness of a previous regression-based approach for the assessment of the agreement between different analytical methods, two modifications/integrations to the original scheme by means of log10 transformation of data and implementation of inherent combined imprecision are presented in this study.
NASA Astrophysics Data System (ADS)
Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Goodenough, Anne E; Hart, Adam G; Stafford, Richard
2012-01-01
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.
Goodenough, Anne E.; Hart, Adam G.; Stafford, Richard
2012-01-01
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines. PMID:22479605
Patching rainfall data using regression methods. 3. Grouping, patching and outlier detection
NASA Astrophysics Data System (ADS)
Pegram, Geoffrey
1997-11-01
Rainfall data are used, amongst other things, for augmenting or repairing streamflow records in a water resources analysis environment. Gaps in rainfall records cause problems in the construction of water-balance models using monthly time-steps, when it becomes necessary to estimate missing values. Modest extensions are sometimes also desirable. It is also important to identify outliers as possible erroneous data and to group data which are hydrologically similar in order to accomplish good patching. Algorithms are described which accomplish these tasks using the covariance biplot, multiple linear regression, singular value decomposition and the pseudo-Expectation-Maximization algorithm.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Penalized multivariate linear mixed model for longitudinal genome-wide association studies.
Liu, Jin; Huang, Jian; Ma, Shuangge
2014-01-01
We consider analysis of Genetic Analysis Workshop 18 data, which involves multiple longitudinal traits and dense genome-wide single-nucleotide polymorphism (SNP) markers. We use a multivariate linear mixed model to account for the covariance of random effects and multivariate residuals. We divide the SNPs into groups according to the genes they belong to and score them using weighted sum statistics. We propose a penalized approach for genetic variant selection at the gene level. The overall modeling and penalized selection method is referred to as the penalized multivariate linear mixed model. Cross-validation is used for tuning parameter selection. A resampling approach is adopted to evaluate the relative stability of the identified genes. Application to the Genetic Analysis Workshop 18 data shows that the proposed approach can effectively select markers associated with phenotypes at gene level.
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption.
A method to determine the necessity for global signal regression in resting-state fMRI studies.
Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang
2012-12-01
In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures.
Quirós, Elia; Felicísimo, Ángel M.; Cuartero, Aurora
2009-01-01
This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test. PMID:22291550
NASA Astrophysics Data System (ADS)
Lei, Beilei; Ma, Yimeng; Li, Jiazhong; Liu, Huanxiang; Yao, Xiaojun; Gramatica, Paola
2010-08-01
Accurate quantitative structure-property relationship (QSPR) models based on a large data set containing a total of 3483 organic compounds were developed to predict chemicals' adsorption capability onto activated carbon in gas phrase. Both global multiple linear regression (MLR) method and local lazy regression (LLR) method were used to develop QSPR models. The results proved that LLR has prediction accuracy 10% higher than that of MLR model. By applying LLR method we can predict the test set (787 compounds) with Q2ext of 0.900 and root mean square error (RMSE) of 0.129. The accurate model based on this large data set could be useful to predict adsorption property of new compounds since such model covers a highly diverse structural space.
Morphological weighted penalized least squares for background correction.
Li, Zhong; Zhan, De-Jian; Wang, Jia-Jun; Huang, Jing; Xu, Qing-Song; Zhang, Zhi-Min; Zheng, Yi-Bao; Liang, Yi-Zeng; Wang, Hong
2013-08-21
Backgrounds existing in the analytical signal always impair the effectiveness of signals and compromise selectivity and sensitivity of analytical methods. In order to perform further qualitative or quantitative analysis, the background should be corrected with a reasonable method. For this purpose, a new automatic method for background correction, which is based on morphological operations and weighted penalized least squares (MPLS), has been developed in this paper. It requires neither prior knowledge about the background nor an iteration procedure or manual selection of a suitable local minimum value. The method has been successfully applied to simulated datasets as well as experimental datasets from different instruments. The results show that the method is quite flexible and could handle different kinds of backgrounds. The proposed MPLS method is implemented and available as an open source package at http://code.google.com/p/mpls.
Laszlo, Anna M; Hulman, Adam; Csicsman, Jozsef; Bari, Ferenc; Nyari, Tibor A
2015-02-01
Suicide rates in Hungary have been analyzed from different aspects in recent decades. However, only descriptive rates have been reported. The aim of our epidemiological study was to characterize the pattern of annual rates of suicide in Hungary during the period 1963-2011 by applying advanced statistical methods. Annual suicide rates per 100,000 population (>6 years) for gender, age group and suicide method were determined from published frequency tables and reference population data obtained from the Hungarian Central Statistical Office. Trends and relative risks of suicide were investigated using negative binomial regression models overall and in stratified analyses (by gender, age group and suicide method). Joinpoint regression analyses were additionally applied to characterize trends and to find turning points during the period 1963-2011. Overall, 178,323 suicides (50,265 females and 128,058 males) were committed in Hungary during the investigated period. The risk of suicide was higher among males than females overall, in all age groups and for most suicide methods. The annual suicide rate exhibited a significant peak in 1982 and remained basically constant after 2006. Different segmented patterns were observed for the suicide rates in the various age groups. Suicide rates revealed segmented linear pattern. This is the first detailed trend analysis with risk estimates obtained via joinpoint and negative binomial regression methods simultaneously for age-specific suicide frequencies in Hungary.
Analyzing Association Mapping in Pedigree-Based GWAS Using a Penalized Multitrait Mixed Model.
Liu, Jin; Yang, Can; Shi, Xingjie; Li, Cong; Huang, Jian; Zhao, Hongyu; Ma, Shuangge
2016-07-01
Genome-wide association studies (GWAS) have led to the identification of many genetic variants associated with complex diseases in the past 10 years. Penalization methods, with significant numerical and statistical advantages, have been extensively adopted in analyzing GWAS. This study has been partly motivated by the analysis of Genetic Analysis Workshop (GAW) 18 data, which have two notable characteristics. First, the subjects are from a small number of pedigrees and hence related. Second, for each subject, multiple correlated traits have been measured. Most of the existing penalization methods assume independence between subjects and traits and can be suboptimal. There are a few methods in the literature based on mixed modeling that can accommodate correlations. However, they cannot fully accommodate the two types of correlations while conducting effective marker selection. In this study, we develop a penalized multitrait mixed modeling approach. It accommodates the two different types of correlations and includes several existing methods as special cases. Effective penalization is adopted for marker selection. Simulation demonstrates its satisfactory performance. The GAW 18 data are analyzed using the proposed method. © 2016 WILEY PERIODICALS, INC.
NASA Astrophysics Data System (ADS)
Katpatal, Y. B.; Paranjpe, S. V.; Kadu, M.
2014-12-01
Effective Watershed management requires authentic data of surface runoff potential for which several methods and models are in use. Generally, non availability of field data calls for techniques based on remote observations. Soil Conservation Services Curve Number (SCS CN) method is an important method which utilizes information generated from remote sensing for estimation of runoff. Several attempts have been made to validate the runoff values generated from SCS CN method by comparing the results obtained from other methods. In the present study, runoff estimation through SCS CN method has been performed using IRS LISS IV data for the Venna Basin situated in the Central India. The field data was available for Venna Basin. The Land use/land cover and soil layers have been generated for the entire watershed using the satellite data and Geographic Information System (GIS). The Venna basin have been divided into intercepted catchment and free catchment. Run off values have been estimated using field data through regression analysis. The runoff values estimated using SCS CN method have been compared with yield values generated using data collected from the tank gauge stations and data from the discharge stations. The correlation helps in validation of the results obtained from the SCS CN method and its applicability in Indian conditions. Key Words: SCS CN Method, Regression Analysis, Land Use / Land cover, Runoff, Remote Sensing, GIS.
Şentürk, Damla; Dalrymple, Lorien S.; Mu, Yi; Nguyen, Danh V.
2014-01-01
SUMMARY We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (a) the probability of cardiovascular events, a binary process and (b) the rate of events once the realization is positive - when the ‘hurdle’ is crossed - using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V
2014-11-10
We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting.
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.
1998-01-01
A key challenge in designing the new High Speed Civil Transport (HSCT) aircraft is determining a good match between the airframe and engine. Multidisciplinary design optimization can be used to solve the problem by adjusting parameters of both the engine and the airframe. Earlier, an example problem was presented of an HSCT aircraft with four mixed-flow turbofan engines and a baseline mission to carry 305 passengers 5000 nautical miles at a cruise speed of Mach 2.4. The problem was solved by coupling NASA Lewis Research Center's design optimization testbed (COMETBOARDS) with NASA Langley Research Center's Flight Optimization System (FLOPS). The computing time expended in solving the problem was substantial, and the instability of the FLOPS analyzer at certain design points caused difficulties. In an attempt to alleviate both of these limitations, we explored the use of two approximation concepts in the design optimization process. The two concepts, which are based on neural network and linear regression approximation, provide the reanalysis capability and design sensitivity analysis information required for the optimization process. The HSCT aircraft optimization problem was solved by using three alternate approaches; that is, the original FLOPS analyzer and two approximate (derived) analyzers. The approximate analyzers were calibrated and used in three different ranges of the design variables; narrow (interpolated), standard, and wide (extrapolated).
NASA Astrophysics Data System (ADS)
Tan, F.; Lim, H. S.; Abdullah, K.; Yoon, T. L.; Zubir Matjafri, M.; Holben, B.
2014-02-01
Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
NASA Astrophysics Data System (ADS)
Cheng, Anyu; Jiang, Xiao; Li, Yongfu; Zhang, Chao; Zhu, Hao
2017-01-01
This study proposes a multiple sources and multiple measures based traffic flow prediction algorithm using the chaos theory and support vector regression method. In particular, first, the chaotic characteristics of traffic flow associated with the speed, occupancy, and flow are identified using the maximum Lyapunov exponent. Then, the phase space of multiple measures chaotic time series are reconstructed based on the phase space reconstruction theory and fused into a same multi-dimensional phase space using the Bayesian estimation theory. In addition, the support vector regression (SVR) model is designed to predict the traffic flow. Numerical experiments are performed using the data from multiple sources. The results show that, compared with the single measure, the proposed method has better performance for the short-term traffic flow prediction in terms of the accuracy and timeliness.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
A least angle regression method for fMRI activation detection in phase-encoded experimental designs.
Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas M; Watson, David R; Benali, Habib
2010-10-01
This paper presents a new regression method for functional magnetic resonance imaging (fMRI) activation detection. Unlike general linear models (GLM), this method is based on selecting models for activation detection adaptively which overcomes the limitation of requiring a predefined design matrix in GLM. This limitation is because GLM designs assume that the response of the neuron populations will be the same for the same stimuli, which is often not the case. In this work, the fMRI hemodynamic response model is selected from a series of models constructed online by the least angle regression (LARS) method. The slow drift terms in the design matrix for the activation detection are determined adaptively according to the fMRI response in order to achieve the best fit for each fMRI response. The LARS method is then applied along with the Moore-Penrose pseudoinverse (PINV) and fast orthogonal search (FOS) algorithm for implementation of the selected model to include the drift effects in the design matrix. Comparisons with GLM were made using 11 normal subjects to test method superiority. This paper found that GLM with fixed design matrix was inferior compared to the described LARS method for fMRI activation detection in a phased-encoded experimental design. In addition, the proposed method has the advantage of increasing the degrees of freedom in the regression analysis. We conclude that the method described provides a new and novel approach to the detection of fMRI activation which is better than GLM based analyses.
ERIC Educational Resources Information Center
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
Paul C. Van Deusen; Linda S. Heath
2010-01-01
Weighted estimation methods for analysis of mapped plot forest inventory data are discussed. The appropriate weighting scheme can vary depending on the type of analysis and graphical display. Both statistical issues and user expectations need to be considered in these methods. A weighting scheme is proposed that balances statistical considerations and the logical...
Rival-model penalized self-organizing map.
Cheung, Yiu-ming; Law, Lap-tak
2007-01-01
As a typical data visualization technique, self-organizing map (SOM) has been extensively applied to data clustering, image analysis, dimension reduction, and so forth. In a conventional adaptive SOM, it needs to choose an appropriate learning rate whose value is monotonically reduced over time to ensure the convergence of the map, meanwhile being kept large enough so that the map is able to gradually learn the data topology. Otherwise, the SOM's performance may seriously deteriorate. In general, it is nontrivial to choose an appropriate monotonically decreasing function for such a learning rate. In this letter, we therefore propose a novel rival-model penalized self-organizing map (RPSOM) learning algorithm that, for each input, adaptively chooses several rivals of the best-matching unit (BMU) and penalizes their associated models, i.e., those parametric real vectors with the same dimension as the input vectors, a little far away from the input. Compared to the existing methods, this RPSOM utilizes a constant learning rate to circumvent the awkward selection of a monotonically decreased function for the learning rate, but still reaches a robust result. The numerical experiments have shown the efficacy of our algorithm.
Penalized maximum-likelihood image reconstruction for lesion detection
NASA Astrophysics Data System (ADS)
Qi, Jinyi; Huesman, Ronald H.
2006-08-01
Detecting cancerous lesions is one major application in emission tomography. In this paper, we study penalized maximum-likelihood image reconstruction for this important clinical task. Compared to analytical reconstruction methods, statistical approaches can improve the image quality by accurately modelling the photon detection process and measurement noise in imaging systems. To explore the full potential of penalized maximum-likelihood image reconstruction for lesion detection, we derived simplified theoretical expressions that allow fast evaluation of the detectability of a random lesion. The theoretical results are used to design the regularization parameters to improve lesion detectability. We conducted computer-based Monte Carlo simulations to compare the proposed penalty function, conventional penalty function, and a penalty function for isotropic point spread function. The lesion detectability is measured by a channelized Hotelling observer. The results show that the proposed penalty function outperforms the other penalty functions for lesion detection. The relative improvement is dependent on the size of the lesion. However, we found that the penalty function optimized for a 5 mm lesion still outperforms the other two penalty functions for detecting a 14 mm lesion. Therefore, it is feasible to use the penalty function designed for small lesions in image reconstruction, because detection of large lesions is relatively easy.
Rezaei, B; Khayamian, T; Mokhtari, A
2009-02-20
A flow injection chemiluminescent (FI-CL) method has been developed for the simultaneous determination of codeine and noscapine using N-PLS regression. The method is based on the fact that kinetic characteristics of codeine and noscapine are different in the Ru(phen)(3)(2+)-Ce(IV) CL system. In flow injection mode, codeine gives broad peak with the highest CL intensity at 4.4s, whereas the maximum CL intensity of the noscapine appears at about 2.6s. Moreover, the effect of increasing H(2)SO(4) concentration was different on the CL intensity of the compounds. An experimental design, central composite design (CCD), was used to realize the optimized variables such as Ru(II) and Ce(IV) concentrations for the both compounds. At the optimized condition, a three-way data structure (samples, H(2)SO(4) concentration, time) was constructed and followed by N-PLS regression. The number of factors for the N-PLS regression was selected based on the minimum values for the root mean squared error of cross validation (RMSECV). The proposed method is applied to the simultaneous quantification of codeine and noscapine in the pharmaceutical preparations.
A Probabilistic Spatial Dengue Fever Risk Assessment by a Threshold-Based-Quantile Regression Method
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies. PMID:25302582
Radial basis function regression methods for predicting quantitative traits using SNP markers.
Long, Nanye; Gianola, Daniel; Rosa, Guilherme J M; Weigel, Kent A; Kranis, Andreas; González-Recio, Oscar
2010-06-01
A challenge when predicting total genetic values for complex quantitative traits is that an unknown number of quantitative trait loci may affect phenotypes via cryptic interactions. If markers are available, assuming that their effects on phenotypes are additive may lead to poor predictive ability. Non-parametric radial basis function (RBF) regression, which does not assume a particular form of the genotype-phenotype relationship, was investigated here by simulation and analysis of body weight and food conversion rate data in broilers. The simulation included a toy example in which an arbitrary non-linear genotype-phenotype relationship was assumed, and five different scenarios representing different broad sense heritability levels (0.1, 0.25, 0.5, 0.75 and 0.9) were created. In addition, a whole genome simulation was carried out, in which three different gene action modes (pure additive, additive+dominance and pure epistasis) were considered. In all analyses, a training set was used to fit the model and a testing set was used to evaluate predictive performance. The latter was measured by correlation and predictive mean-squared error (PMSE) on the testing data. For comparison, a linear additive model known as Bayes A was used as benchmark. Two RBF models with single nucleotide polymorphism (SNP)-specific (RBF I) and common (RBF II) weights were examined. Results indicated that, in the presence of complex genotype-phenotype relationships (i.e. non-linearity and non-additivity), RBF outperformed Bayes A in predicting total genetic values using SNP markers. Extension of Bayes A to include all additive, dominance and epistatic effects could improve its prediction accuracy. RBF I was generally better than RBF II, and was able to identify relevant SNPs in the toy example.
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies.
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Penal sum of bond. 25.93... OF THE TREASURY LIQUORS BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers....164(c)(2), the penal sum of the brewers bond must be equal to 10 percent of the maximum amount of tax...
Huang, Lei
2015-09-30
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required.
NASA Astrophysics Data System (ADS)
Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.
2017-08-01
Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically
Li, Yanming; Zhu, Ji
2015-01-01
Summary We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functioning groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. PMID:25732839
NASA Astrophysics Data System (ADS)
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
USDA-ARS?s Scientific Manuscript database
The beard testing method for measuring cotton fiber length is based on the fibrogram theory. However, in the instrumental implementations, the engineering complexity alters the original fiber length distribution observed by the instrument. This causes challenges in obtaining the entire original le...
2014-01-01
Background Chronic kidney disease (CKD) is a progressive and usually irreversible disease. Different types of outcomes are of interest in the course of CKD such as time-to-dialysis, transplantation or decline of the glomerular filtration rate (GFR). Statistical analyses aiming at investigating the association between these outcomes and risk factors raise a number of methodological issues. The objective of this study was to give an overview of these issues and to highlight some statistical methods that can address these topics. Methods A literature review of statistical methods published between 2002 and 2012 to investigate risk factors of CKD outcomes was conducted within the Scopus database. The results of the review were used to identify important methodological issues as well as to discuss solutions for each type of CKD outcome. Results Three hundred and four papers were selected. Time-to-event outcomes were more often investigated than quantitative outcome variables measuring kidney function over time. The most frequently investigated events in survival analyses were all-cause death, initiation of kidney replacement therapy, and progression to a specific value of GFR. While competing risks were commonly accounted for, interval censoring was rarely acknowledged when appropriate despite existing methods. When the outcome of interest was the quantitative decline of kidney function over time, standard linear models focussing on the slope of GFR over time were almost as often used as linear mixed models which allow various numbers of repeated measurements of kidney function per patient. Informative dropout was accounted for in some of these longitudinal analyses. Conclusions This study provides a broad overview of the statistical methods used in the last ten years for investigating risk factors of CKD progression, as well as a discussion of their limitations. Some existing potential alternatives that have been proposed in the context of CKD or in other contexts are
Du, Hongying; Hu, Zhide; Bazzoli, Andrea; Zhang, Yang
2011-01-01
The epidermal growth factor receptor (EGFR) protein tyrosine kinase (PTK) is an important protein target for anti-tumor drug discovery. To identify potential EGFR inhibitors, we conducted a quantitative structure–activity relationship (QSAR) study on the inhibitory activity of a series of quinazoline derivatives against EGFR tyrosine kinase. Two 2D-QSAR models were developed based on the best multi-linear regression (BMLR) and grid-search assisted projection pursuit regression (GS-PPR) methods. The results demonstrate that the inhibitory activity of quinazoline derivatives is strongly correlated with their polarizability, activation energy, mass distribution, connectivity, and branching information. Although the present investigation focused on EGFR, the approach provides a general avenue in the structure-based drug development of different protein receptor inhibitors. PMID:21811593
NASA Astrophysics Data System (ADS)
Storm, Emma; Weniger, Christoph; Calore, Francesca
2017-08-01
We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.
Deng, Zhaohong; Choi, Kup-Sze; Jiang, Yizhang; Wang, Shitong
2014-12-01
Inductive transfer learning has attracted increasing attention for the training of effective model in the target domain by leveraging the information in the source domain. However, most transfer learning methods are developed for a specific model, such as the commonly used support vector machine, which makes the methods applicable only to the adopted models. In this regard, the generalized hidden-mapping ridge regression (GHRR) method is introduced in order to train various types of classical intelligence models, including neural networks, fuzzy logical systems and kernel methods. Furthermore, the knowledge-leverage based transfer learning mechanism is integrated with GHRR to realize the inductive transfer learning method called transfer GHRR (TGHRR). Since the information from the induced knowledge is much clearer and more concise than that from the data in the source domain, it is more convenient to control and balance the similarity and difference of data distributions between the source and target domains. The proposed GHRR and TGHRR algorithms have been evaluated experimentally by performing regression and classification on synthetic and real world datasets. The results demonstrate that the performance of TGHRR is competitive with or even superior to existing state-of-the-art inductive transfer learning algorithms.
Outlier detection method in linear regression based on sum of arithmetic progression.
Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R < 2/n implies that the minimum is an outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e - 2 to ±1.0e + 2 from the correct value.
Stojić, Andreja; Maletić, Dimitrije; Stanišić Stojić, Svetlana; Mijić, Zoran; Šoštarić, Andrej
2015-07-15
In this study, advanced multivariate methods were applied for VOC source apportionment and subsequent short-term forecast of industrial- and vehicle exhaust-related contributions in Belgrade urban area (Serbia). The VOC concentrations were measured using PTR-MS, together with inorganic gaseous pollutants (NOx, NO, NO2, SO2, and CO), PM10, and meteorological parameters. US EPA Positive Matrix Factorization and Unmix receptor models were applied to the obtained dataset both resolving six source profiles. For the purpose of forecasting industrial- and vehicle exhaust-related source contributions, different multivariate methods were employed in two separate cases, relying on meteorological data, and on meteorological data and concentrations of inorganic gaseous pollutants, respectively. The results indicate that Boosted Decision Trees and Multi-Layer Perceptrons were the best performing methods. According to the results, forecasting accuracy was high (lowest relative error of only 6%), in particular when the forecast was based on both meteorological parameters and concentrations of inorganic gaseous pollutants. Copyright © 2015. Published by Elsevier B.V.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Joo, Jong Wha J; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Hormozdiari, Farhad; Lusis, Aldons J; Eskin, Eleazar
2016-12-01
A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms. Copyright © 2016 by the Genetics Society of America.
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
Dai, Huanping; Micheyl, Christophe
2012-01-01
Psychophysical “reverse-correlation” methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments. PMID:23145622
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
Stride, P
2011-09-01
Robert Garrett emigrated from Scotland to Van Diemen's Land (now Tasmania) in 1822. Within a few months of arrival he was posted to the barbaric penal colony in Macquarie Harbour, known as Sarah Island. His descent into alcoholism, medical misadventure and premature death were related to his largely unsupported professional environment and were, in many respects, typical of those subjected to this experience.
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension
Wang, Lan; Wu, Yichao
2012-01-01
Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036
2010-01-01
Background Various perinatal factors influencing neuromotor development are known from cross sectional studies. Factors influencing the age at which distinct abilities are acquired are uncertain. We hypothesized that the Cox regression model might identify these factors. Methods Neonates treated at Aachen University Hospital in 2000/2001 were identified retrospectively (n = 796). Outcome data, based on a structured interview, were available from 466 children, as were perinatal data. Factors possibly related to outcome were identified by bootstrap selection and then included into a multivariate Cox regression model. To evaluate if the parental assessment might change with the time elapsed since birth we studied five age cohorts of 163 normally developed children. Results Birth weight, gestational age, congenital cardiac disease and periventricular leukomalacia were related to outcome in the multivariate analysis (p < 0.05). Analysis of the control cohorts revealed that the parents' assessment of the ability of bladder control is modified by the time elapsed since birth. Conclusions Combined application of the bootstrap resampling procedure and multivariate Cox regression analysis effectively identifies perinatal factors influencing the age at which distinct abilities are acquired. These were similar as known from previous cross sectional studies. Retrospective data acquistion may lead to a bias because the parental memories change with time. This recommends applying this statistical approach in larger prospective trials. PMID:20205739
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2015-07-01
In operational hydrology, estimation of the predictive uncertainty of hydrological models used for flood modelling is essential for risk-based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analysing and predicting uncertainty. However, studies devoted to comparing the performance of the methods in predicting uncertainty are limited. This paper focuses on the methods predicting model residual uncertainty that differ in methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC). The comparison of the methods is aimed at investigating how well a simpler method using fewer input data performs over a more complex method with more predictors. We test these two methods on several catchments from the UK that vary in hydrological characteristics and the models used. Special attention is given to the methods' performance under different hydrological conditions. Furthermore, normality of model residuals in data clusters (identified by UNEEC) is analysed. It is found that basin lag time and forecast lead time have a large impact on the quantification of uncertainty and the presence of normality in model residuals' distribution. In general, it can be said that both methods give similar results. At the same time, it is also shown that the UNEEC method provides better performance than QR for small catchments with the changing hydrological dynamics, i.e. rapid response catchments. It is recommended that more case studies of catchments of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features, be considered.
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2014-09-01
In operational hydrology, estimation of predictive uncertainty of hydrological models used for flood modelling is essential for risk based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analyzing and predicting uncertainty. However, case studies comparing performance of these methods, most particularly predictive uncertainty methods, are limited. This paper focuses on two predictive uncertainty methods that differ in their methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC), aiming at identifying possible advantages and disadvantages of these methods (both estimating residual uncertainty) based on their comparative performance. We test these two methods on several catchments (from UK) that vary in its hydrological characteristics and models. Special attention is given to the errors for high flow/water level conditions. Furthermore, normality of model residuals is discussed in view of clustering approach employed within the framework of UNEEC method. It is found that basin lag time and forecast lead time have great impact on quantification of uncertainty (in the form of two quantiles) and achievement of normality in model residuals' distribution. In general, uncertainty analysis results from different case studies indicate that both methods give similar results. However, it is also shown that UNEEC method provides better performance than QR for small catchments with changing hydrological dynamics, i.e. rapid response catchments. We recommend that more case studies of catchments from regions of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features be tested.
NASA Astrophysics Data System (ADS)
Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.
2014-03-01
We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.
[Analysis of selected changes in project the penal code].
Berent, Jarosław; Jurczyk, Agnieszka P; Szram, Stefan
2002-01-01
In this paper the authors have analysed selected proposals of changes in the project of amendments in the penal code. Special attention has been placed on problem of the legality of the "comma" in art. 156 of the penal code. In this matter also a review of court jurisdiction has been made.
Sparse Regression by Projection and Sparse Discriminant Analysis.
Qi, Xin; Luo, Ruiyan; Carroll, Raymond J; Zhao, Hongyu
2015-04-01
Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared to the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplemental materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin; Luo, Ruiyan; Carroll, Raymond J.; Zhao, Hongyu
2014-01-01
Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared to the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplemental materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided. PMID:26345204
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface
Eekhout, Iris; van de Wiel, Mark A; Heymans, Martijn W
2017-08-22
Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
The cross politics of Ecuador's penal state.
Garces, Chris
2010-01-01
This essay examines inmate "crucifixion protests" in Ecuador's largest prison during 2003-04. It shows how the preventively incarcerated-of whom there are thousands-managed to effectively denounce their extralegal confinement by embodying the violence of the Christian crucifixion story. This form of protest, I argue, simultaneously clarified and obscured the multiple layers of sovereign power that pressed down on urban crime suspects, who found themselves persecuted and forsaken both outside and within the space of the prison. Police enacting zero-tolerance policies in urban neighborhoods are thus a key part of the penal state, as are the politically threatened family members of the indicted, the sensationalized local media, distrustful neighbors, prison guards, and incarcerated mafia. The essay shows how the politico-theological performance of self-crucifixion responded to these internested forms of sovereign violence, and were briefly effective. The inmates' cross intervention hence provides a window into the way sovereignty works in the Ecuadorean penal state, drawing out how incarceration trends and new urban security measures interlink, and produce an array of victims.
Strong, Mark; Oakley, Jeremy E; Brennan, Alan; Breeze, Penny
2015-07-01
Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method.
[Legal probation of juvenile offenders after release from penal reformative training].
Urbaniok, Frank; Rossegger, Astrid; Fegert, Jörg; Rubertus, Michael; Endrass, Jérôme
2007-01-01
Over recent years, there has been an increase in adolescent delinquency in Germany and Switzerland. In this context, the episodic character of the majority of adolescent delinquency is usually pointed out; however, numerous studies show high re-offending rates for released adolescents. The goal of this study is to examine the legal probation of juvenile delinquents after release from penal reformative training. In this study, the legal probation of adolescents committed to the AEA Uitikon, in the Canton of Zurich, between 1974 and 1986 was scrutinized by examining extracts from their criminal record as of 2003. The period of catamnesis was thus between 17 and 29 years. Overall, 71% of offenders reoffended, 29% with a violent or sexual offence. Bivariate logistic regression showed that the kind of offence committed had no influence on the probability of recidivism. If commitment to the AEA was due to a single offence (as opposed to serial offences), the risk of recidivism was reduced by 71% (OR=0.29). The results of the study show that young delinquents sentenced and committed to penal reformative training have a high recidivism risk. Furthermore, the results point out the importance of the evaluation of the offense-preventive efficacy of penal measures.
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Turati, Pietro; Zio, Enrico
2015-08-01
In this work, we propose a modification of the traditional Auto Associative Kernel Regression (AAKR) method which enhances the signal reconstruction robustness, i.e., the capability of reconstructing abnormal signals to the values expected in normal conditions. The modification is based on the definition of a new procedure for the computation of the similarity between the present measurements and the historical patterns used to perform the signal reconstructions. The underlying conjecture for this is that malfunctions causing variations of a small number of signals are more frequent than those causing variations of a large number of signals. The proposed method has been applied to real normal condition data collected in an industrial plant for energy production. Its performance has been verified considering synthetic and real malfunctioning. The obtained results show an improvement in the early detection of abnormal conditions and the correct identification of the signals responsible of triggering the detection.
Huang, Hai-Hui; Liu, Xiao-Ying; Liang, Yong
2016-01-01
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods. PMID:27136190
[Group Lasso Penalized Classifier for Diagnosis of Diseases with Categorical Data].
Wang, Jinjia; Xue, Fang
2015-10-01
Six kinds of erythemato-squamous diseases have been common skin diseases, but the diagnosis of them has always been a problem. The quantitative data processing method is not suitable for erythemato-squamous data because they are categorical qualitative data. This paper proposed a new method based on group lasso penalized classification for the feature selection and classification for erythemato-squamous data with categorical qualitative data. The first categorical data of 33 dimensions were changed by the virtual code, and then 34th dimension age data were discretized and changed by the virtual code. Then the encoded data were grouped according to class group and variable group. Lastly Group Lasso penalized classification was executed. The classified accuracy of 10-fold cross validation was 98.88% ± 0.002 3%. Compared with those of other method in the literature, this new method is simpler, and better for effect and efficiency, and has stronger interpretability and stronger stability.
Li, Y.; Graubard, B. I.; Huang, P.; Gastwirth, J. L.
2015-01-01
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters–Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance–covariance estimator that is based on the Taylor linearization variance–covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999–2004, are conducted. Empirical results indicate that the Taylor linearization variance–covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
27 CFR 19.245 - Bonds and penal sums of bonds.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonds and penal sums of... Bonds and penal sums of bonds. The bonds, and the penal sums thereof, required by this subpart, are as follows: Penal Sum Type of bond Basis Minimum Maximum (a) Operations bond: (1) One plant bond— (i...
Asadpour-Zeynali, Karim; Soheili-Azad, Payam
2012-01-01
A differential pulse polarography (DPP) for the simultaneous determination of 2-nitrophenol and 4-nitrophenol was proposed. It was found that under optimum experimental conditions (pH = 5, scan rate = 5 mV/s, pulse amplitude = -50 mV), 2-nitrophenol and 4-nitrophenol had well-defined polarographic reduction waves with peak potentials at -317 and -406 mV, respectively. In the mixture of two compounds overlapping polarographic peaks were observed. In this study, support vector regression (SVR) was applied to resolve the overlapped polarograms. Furthermore, a comparison was made between the performance of SVR and partial least square (PLS) on data set. The results demonstrated that SVR is a better well-performing alternative for the analysis and modeling of DPP data than the commonly applied PLS technique. The proposed method was used for the determination of 2-nitrophenol and 4-nitrophenol in industrial waste water.
Mortality study of nickel-cadmium battery workers by the method of regression models in life tables.
Sorahan, T; Waterhouse, J A
1983-01-01
The mortality experienced by a cohort of 3025 nickel-cadmium battery workers during the period 1946-81 has been investigated. Occupational histories were described in terms of some 75 jobs: eight with "high", 14 with "moderate" or slight, and 53 with minimal exposure to cadmium oxide (hydroxide). The method of regression models in life tables (RMLT) was used to compare the estimated cadmium exposures (durations of exposed employment) of those dying from causes of interest with those of matching survivors in the same year of follow up, while controlling for sex, for year and age of starting employment, and for duration of employment. No new evidence of any association between occupational exposure to cadmium oxide (hydroxide) and cancer of the prostate was found. PMID:6871118
NASA Astrophysics Data System (ADS)
Yadav, Manish; Singh, Nitin Kumar
2017-08-01
A comparison of the linear and non-linear regression method in selecting the optimum isotherm among three most commonly used adsorption isotherms (Langmuir, Freundlich, and Redlich-Peterson) was made to the experimental data of fluoride (F) sorption onto Bio-F at a solution temperature of 30 ± 1 °C. The coefficient of correlation (r2 ) was used to select the best theoretical isotherm among the investigated ones. A total of four Langmuir linear equations were discussed and out of which linear form of most popular Langmuir-1 and Langmuir-2 showed the higher coefficient of determination (0.976 and 0.989) as compared to other Langmuir linear equations. Freundlich and Redlich-Peterson isotherms showed a better fit to the experimental data in linear least-square method, while in non-linear method Redlich-Peterson isotherm equations showed the best fit to the tested data set. The present study showed that the non-linear method could be a better way to obtain the isotherm parameters and represent the most suitable isotherm. Redlich-Peterson isotherm was found to be the best representative (r2 = 0.999) for this sorption system. It is also observed that the values of β are not close to unity, which means the isotherms are approaching the Freundlich but not the Langmuir isotherm.
SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test
Kim, Jinseog; Kim, Dennis (Dong Hwan); Jung, Sin-Ho
2013-01-01
One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity. PMID:24174989
Risser, Dennis W.; Thompson, Ronald E.; Stuckey, Marla H.
2008-01-01
A method was developed for making estimates of long-term, mean annual ground-water recharge from streamflow data at 80 streamflow-gaging stations in Pennsylvania. The method relates mean annual base-flow yield derived from the streamflow data (as a proxy for recharge) to the climatic, geologic, hydrologic, and physiographic characteristics of the basins (basin characteristics) by use of a regression equation. Base-flow yield is the base flow of a stream divided by the drainage area of the basin, expressed in inches of water basinwide. Mean annual base-flow yield was computed for the period of available streamflow record at continuous streamflow-gaging stations by use of the computer program PART, which separates base flow from direct runoff on the streamflow hydrograph. Base flow provides a reasonable estimate of recharge for basins where streamflow is mostly unaffected by upstream regulation, diversion, or mining. Twenty-eight basin characteristics were included in the exploratory regression analysis as possible predictors of base-flow yield. Basin characteristics found to be statistically significant predictors of mean annual base-flow yield during 1971-2000 at the 95-percent confidence level were (1) mean annual precipitation, (2) average maximum daily temperature, (3) percentage of sand in the soil, (4) percentage of carbonate bedrock in the basin, and (5) stream channel slope. The equation for predicting recharge was developed using ordinary least-squares regression. The standard error of prediction for the equation on log-transformed data was 9.7 percent, and the coefficient of determination was 0.80. The equation can be used to predict long-term, mean annual recharge rates for ungaged basins, providing that the explanatory basin characteristics can be determined and that the underlying assumption is accepted that base-flow yield derived from PART is a reasonable estimate of ground-water recharge rates. For example, application of the equation for 370
Dielectric function parameterization by penalized splines
NASA Astrophysics Data System (ADS)
Likhachev, Dmitriy V.
2017-06-01
In this article, we investigate the penalized spline (P-spline) approach to restrict flexibility of dielectric function parameterization by B-splines and prevent overfitting of the ellipsometric data. The penalty degree is easily controlled by a certain smoothing parameter. The P-spline approach offers a number of advantages over well-established B-spline parameterization. First of all, it typically uses an equidistant knot arrangement which simplifies the construction of the roughness penalties and makes it computationally efficient. Since P-splines possess the "power of the penalty" property, a selection of the number of knots is no longer crucial, as long as there is a minimum knot number to capture all significant spatial variability of the data curves. We demonstrate the proposed approach by real-data application with ellipsometric spectra from aluminum-coated sample.
Recovering Velocity Distributions Via Penalized Likelihood
NASA Astrophysics Data System (ADS)
Merritt, David
1997-07-01
Line-of-sight velocity distributions are crucial for unravelling the dynamics of hot stellar systems. We present a new formalism based on penalized likelihood for deriving such distributions from kinematical data, and evaluate the performance of two algorithms that extract N(V) from absorption-line spectra and from sets of individual velocities. Both algorithms are superior to existing ones in that the solutions are nearly unbiased even when the data are so poor that a great deal of smoothing is required. In addition, the discrete-velocity algorithm is able to remove a known distribution of measurement errors from the estimate of N(V). The formalism is used to recover the velocity distribution of stars in five fields near the center of the globular cluster omega Centauri.
Reconstruction of freehand 3D ultrasound based on kernel regression.
Chen, Xiankang; Wen, Tiexiang; Li, Xingmin; Qin, Wenjian; Lan, Donglai; Pan, Weizhou; Gu, Jia
2014-08-28
Freehand three-dimensional (3D) ultrasound has the advantages of flexibility for allowing clinicians to manipulate the ultrasound probe over the examined body surface with less constraint in comparison with other scanning protocols. Thus it is widely used in clinical diagnose and image-guided surgery. However, as the data scanning of freehand-style is subjective, the collected B-scan images are usually irregular and highly sparse. One of the key procedures in freehand ultrasound imaging system is the volume reconstruction, which plays an important role in improving the reconstructed image quality. A novel freehand 3D ultrasound volume reconstruction method based on kernel regression model is proposed in this paper. Our method consists of two steps: bin-filling and regression. Firstly, the bin-filling step is used to map each pixel in the sampled B-scan images to its corresponding voxel in the reconstructed volume data. Secondly, the regression step is used to make the nonparametric estimation for the whole volume data from the previous sampled sparse data. The kernel penalizes distance away from the current approximation center within a local neighborhood. To evaluate the quality and performance of our proposed kernel regression algorithm for freehand 3D ultrasound reconstruction, a phantom and an in-vivo liver organ of human subject are scanned with our freehand 3D ultrasound imaging system. Root mean square error (RMSE) is used for the quantitative evaluation. Both of the qualitative and quantitative experimental results demonstrate that our method can reconstruct image with less artifacts and higher quality. The proposed kernel regression based reconstruction method is capable of constructing volume data with improved accuracy from irregularly sampled sparse data for freehand 3D ultrasound imaging system.
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel
Hirakawa, Akihiro; Hamada, Chikuma; Yoshimura, Isao
2012-01-01
When identifying the differentially expressed genes (DEGs) in microarray data, we often observe heteroscedasticity between groups and dependence among genes. Incorporating these factors is necessary for sample size calculation in microarray experiments. A penalized t-statistic is widely used to improve the identifiability of DEGs. We develop a formula to calculate sample size with dependence adjustment for the penalized t-statistic. Sample size is determined on the basis of overall power under certain conditions to maintain a certain false discovery rate. The usefulness of the proposed method is demonstrated by numerical studies using both simulated data and real data.
Xu, A; Zhang, Y; Ran, T; Liu, H; Lu, S; Xu, J; Xiong, X; Jiang, Y; Lu, T; Chen, Y
2015-01-01
Bruton's tyrosine kinase (BTK) plays a crucial role in B-cell activation and development, and has emerged as a new molecular target for the treatment of autoimmune diseases and B-cell malignancies. In this study, two- and three-dimensional quantitative structure-activity relationship (2D and 3D-QSAR) analyses were performed on a series of pyridine and pyrimidine-based BTK inhibitors by means of genetic algorithm optimized multivariate adaptive regression spline (GA-MARS) and comparative molecular similarity index analysis (CoMSIA) methods. Here, we propose a modified MARS algorithm to develop 2D-QSAR models. The top ranked models showed satisfactory statistical results (2D-QSAR: Q(2) = 0.884, r(2) = 0.929, r(2)pred = 0.878; 3D-QSAR: q(2) = 0.616, r(2) = 0.987, r(2)pred = 0.905). Key descriptors selected by 2D-QSAR were in good agreement with the conclusions of 3D-QSAR, and the 3D-CoMSIA contour maps facilitated interpretation of the structure-activity relationship. A new molecular database was generated by molecular fragment replacement (MFR) and further evaluated with GA-MARS and CoMSIA prediction. Twenty-five pyridine and pyrimidine derivatives as novel potential BTK inhibitors were finally selected for further study. These results also demonstrated that our method can be a very efficient tool for the discovery of novel potent BTK inhibitors.
Wang, Ji-Hua; Huang, Wen-Jiang; Lao, Cai-Lian; Zhang, Lu-Da; Luo, Chang-Bing; Wang, Tao; Liu, Liang-Yun; Song, Xiao-Yu; Ma, Zhi-Hong
2007-07-01
With the widespread application of remote sensing (RS) in agriculture, monitoring and prediction of crop nutrition condition attracts attention of many scientists. Foliar nitrogen content (N) is one of the most important nutrients for plant growth, and vertical leaf N gradient is an important indicator of crop nutrition situation. Investigations have been made on N vertical distribution to describe the growth status of winter wheat. Results indicate that from the canopy top to the ground surface, N shows an obvious gradient decreasing trend. The objective of this study was to discuss the inversion method of N vertical distribution with canopy reflected spectrum by the partial least squares regression (PLS) method. PLS was selected for the inversion of upper, middle and lower layers of N. To improve the accuracy of prediction, the N in the upper layer as well as in the middle and bottom layers should be taken into consideration when crop nutrition condition is appraised by RS data. The established models by the observed data in year 2001-2002 were validated by the data in year 2003-2004. The inversion precision and error were acceptable. It provided a theoretic basis for widely and non-damaged variable rate nitrogen application of winter wheat by canopy reflected spectrum.
A Guide to Assistance In Penal and Correctional Institutions
ERIC Educational Resources Information Center
Walker, Bailus; Gordon, Theodore
1973-01-01
Lists the more significant federal assistance programs relating to penal and correctional reform which may serve as a guide for environmental health specialists who are beginning to assume much broader responsibilities in institutional environmental quality. (JR)
A Guide to Assistance In Penal and Correctional Institutions
ERIC Educational Resources Information Center
Walker, Bailus; Gordon, Theodore
1973-01-01
Lists the more significant federal assistance programs relating to penal and correctional reform which may serve as a guide for environmental health specialists who are beginning to assume much broader responsibilities in institutional environmental quality. (JR)
NASA Astrophysics Data System (ADS)
Clement, Dominic; Gruber, Nicolas
2017-04-01
Major progress has been made by the international community (e.g., GO-SHIP, IOCCP, IMBER/SOLAS carbon working groups) in recent years by collecting and providing homogenized datasets for carbon and other biogeochemical variables in the surface ocean (SOCAT) and interior ocean (GLODAPv2). Together with previous efforts, this has enabled the community to develop methods to assess changes in the ocean carbon cycle through time. Of particular interest is the determination of the decadal change in the anthropogenic CO2 inventory solely based on in-situ measurements from at least two time periods in the interior ocean. However, all such methods face the difficulty of a scarce dataset in both space and time, making the use of appropriate interpolation techniques in time and space a crucial element of any method. Here we present a new method based on the parameter C*, whose variations reflect the total change in dissolved inorganic carbon (DIC) driven by the exchange of CO2 across the air-sea interface. We apply the extended Multiple Linear Regression method (Friis et al., 2005) on C* in order (1) to calculate the change in anthropogenic CO2 from the original DIC/C* measurements, and (2) to interpolate the result onto a spatial grid using other biogeochemical variables (T,S,AOU, etc.). These calculations are made on isopycnal slabs across whole ocean basins. In combination with the transient steady state assumption (Tanhua et al., 2007) providing a temporal correction factor, we address the spatial and temporal interpolation challenges. Using synthetic data from a hindcast simulation with a global ocean biogeochemistry model (NCAR-CCSM with BEC), we tested the method for robustness and accuracy in determining ΔCant. We will present data-based results for all ocean basins, with the most recent estimate of an global uptake of 32±6 Pg C between 1994 and 2007, indicating an uptake rate 2.5±0.5 Pg C yr-1 for this time period. These results are compared with regional and
Ismail, B; Anil, Manjula
2014-01-01
With modernization, rapid urbanization and industrialization, the price that the society is paying is tremendous load of "Non-Communicable" diseases, referred to as "Lifestyle Diseases". Coronary artery disease (CAD), one of the lifestyle diseases that manifests at a younger age can have divesting consequences for an individual, the family and society. Prevention of these diseases can be done by studying the risk factors, analyzing and interpreting them using various statistical methods. To determine, using logistic regression the relative contribution of independent variables according to the intensity of their influence (proven by statistical significance) upon the occurrence of values of the dependent cardio vascular risk scores. Additionally, we wanted to assess whether non parametric smoothing of the cardio vascular risk scores can be used as a better statistical method as compared to the existing methods. The study includes 498 students in the age group of 18-29 years. Prevalence of over weight (BMI 23-25 kg/m(2)) and obesity (BMI > 25 Kg/m(2)) was found among individuals of 22 years and above. Non smokers had decreased odds (OR = 0.041, CI = 0.015-0.107) and also increase in LDL Cholesterol (OR = 1.05, CI = 1.021-1.055) and BMI (OR = 1.42, CI = 1.244-1.631) were significantly contributing towards the risk of CVD. Localite students had decreased odds of developing CVD in the next 10 years (OR = 0.27, CI = 0.092-0.799) as compared to students residing in hostel or paying guests. Copyright © 2014 Cardiological Society of India. Published by Elsevier B.V. All rights reserved.
De Steur, Hans; Wesana, Joshua; Blancquaert, Dieter; Van Der Straeten, Dominique; Gellynck, Xavier
2017-02-01
Following the growing evidence on biofortification as a cost-effective micronutrient strategy, various researchers have elicited consumers' willingness to pay (WTP) for biofortified crops in an effort to justify and determine their adoption. This review presents a meta-analysis of WTP studies on biofortified foods, either developed through conventional breeding or using genetic modification technology. On the basis of 122 estimates from 23 studies (9507 respondents), consumers are generally willing to pay 21.3% more for biofortified crops. Because WTP estimates are often determined through different valuation methods and procedures, a meta-regression was carried out to examine the role of potential determinants. Aside from contextual factors, such as type of food crop, target nutrient, and region (but not breeding technique), various methodological factors significantly influence premiums, including the type of respondent, nature of the study, study environment, participation fee, and provided information. The findings allow researchers to better anticipate potential methodological biases when examining WTP for (biofortified) foods, while it gives policy makers a broad understanding of the potential demand for different biofortified crops in various settings. © 2016 New York Academy of Sciences.
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P < 0.05) DM, ileal and total tract of DM, energy and nitrogen digestibilities and utilization. In Exp. 2, addition of A-V blend to the RD linearly increased (P < 0.001) ileal digestibilities and total tract utilization of DM, energy and nitrogen as well as IDE, ME, and MEn. Regressions of PBM-associated IDE, ME, or MEn intake in kcal against PBM intake were: IDE = 3,537x + 4.953, r(2) = 0.97; ME = 3,805x + 1.279, r(2) = 0.97; MEn = 3,278x + 0.164, r(2) = 0.90; and A-V blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124.
Lopiano, Kenneth K; Young, Linda J; Gotway, Carol A
2014-09-01
Spatially referenced datasets arising from multiple sources are routinely combined to assess relationships among various outcomes and covariates. The geographical units associated with the data, such as the geographical coordinates or areal-level administrative units, are often spatially misaligned, that is, observed at different locations or aggregated over different geographical units. As a result, the covariate is often predicted at the locations where the response is observed. The method used to align disparate datasets must be accounted for when subsequently modeling the aligned data. Here we consider the case where kriging is used to align datasets in point-to-point and point-to-areal misalignment problems when the response variable is non-normally distributed. If the relationship is modeled using generalized linear models, the additional uncertainty induced from using the kriging mean as a covariate introduces a Berkson error structure. In this article, we develop a pseudo-penalized quasi-likelihood algorithm to account for the additional uncertainty when estimating regression parameters and associated measures of uncertainty. The method is applied to a point-to-point example assessing the relationship between low-birth weights and PM2.5 levels after the onset of the largest wildfire in Florida history, the Bugaboo scrub fire. A point-to-areal misalignment problem is presented where the relationship between asthma events in Florida's counties and PM2.5 levels after the onset of the fire is assessed. Finally, the method is evaluated using a simulation study. Our results indicate the method performs well in terms of coverage for 95% confidence intervals and naive methods that ignore the additional uncertainty tend to underestimate the variability associated with parameter estimates. The underestimation is most profound in Poisson regression models.
van Teunenbroek, A; Stijnen, T; Otten, B; de Muinck Keizer-Schrama, S; Naeraa, R W; Rongen-Westerlaken, C; Drop, S
1996-04-01
A total of 235 measurement points of 57 Dutch women with Turner's syndrome (TS), including women with spontaneous menarche and oestrogen treatment, served to develop a new Turner-specific final height (FH) prediction method (PTS). Analogous to the Tanner and Whitehouse mark 2 method (TW) for normal children, smoothed regression coefficients are tabulated for PTS for height (H), chronological age (CA) and bone age (BA), both TW RUS and Greulich and Pyle (GP). Comparison between all methods on 40 measurement points of 21 Danish TS women showed small mean prediction errors (predicted minus observed FH) and corresponding standard deviation (ESD) of both PTSRUS and PTSGP, in particular at the "younger" ages. Comparison between existing methods on the Dutch data indicated a tendency to overpredict FH. Before the CA of 9 years the mean prediction errors of the Bayley and Pinneau and TW methods were markedly higher compared with the other methods. Overall, the simplest methods--projected height (PAH) and its modification (mPAH)--were remarkably good at most ages. Although the validity of PTSRUS and PTSGP remains to be tested below the age of 6 years, both gave small mean prediction errors and a high accuracy. FH prediction in TS is important in the consideration of growth-promoting therapy or in the evaluation of its effects.
Numerical modeling of flexible insect wings using volume penalization
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Joern
2012-11-01
We consider the effects of chordwise flexibility on the aerodynamic performance of insect flapping wings. We developed a numerical method for modeling viscous fluid flows past moving deformable foils. It extends on the previously reported model for flows past moving rigid wings (J Comput Phys 228, 2009). The two-dimensional Navier-Stokes equations are solved using a Fourier pseudo-spectral method with the no-slip boundary conditions imposed by the volume penalization method. The deformable wing section is modeled using a non-linear beam equation. We performed numerical simulations of heaving flexible plates. The results showed that the optimal stroke frequency, which maximizes the mean thrust, is lower than the resonant frequency, in agreement with the experiments by Ramananarivo et al. (PNAS 108(15), 2011). The oscillatory part of the force only increases in amplitude when the frequency increases, and at the optimal frequency it is about 3 times larger than the mean force. We also study aerodynamic interactions between two heaving flexible foils. This flow configuration corresponds to the wings of dragonflies. We explore the effects of the phase difference and spacing between the fore- and hind-wing.
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Nannings, Barry; Abu-Hanna, Ameen; de Jonge, Evert
2008-04-01
To apply the Patient Rule Induction Method (PRIM) to identify very elderly Intensive Care (IC) patients at high risk of mortality, and compare the results with those of a conventional logistic regression model. A database containing all 12,993 consecutive admissions of patients aged at least 80 between January 1997 and October 2005 from intensive care units (n=33) of mixed type taking part in the National Intensive Care Evaluation (NICE) registry. Demographic, diagnostic, physiologic, laboratory, discharge and prognostic score data were collected. After application of the SAPS II inclusion criteria 6617 patients remained. In these data we searched PRIM subgroups requiring at least 85% mortality and coverage of at least 3% of the patients. Equally sized subgroups were derived from a recalibrated (second level customization) Simplified Acute Physiology Score II model, where new coefficients were fitted. Subgroups were compared on an independent validation set using the positive predictive value (PPV), here equaling the subgroup mortality. We identified four subgroups with a positive predictive value (PPV) of 92%, 90%, 87% and 87%, covering, respectively, 3%, 3.5%, 7% and 10% of the patients in the validation set. Urine production, lowest pH, lowest systolic blood pressure, mechanical ventilation, all measured within 24 h after admission, and admission type and Glasgow Coma Score were used to define these subgroups. SAPS and PRIM subgroups had equal PPVs. PRIM successfully identified high-risk subgroups. The subgroups compare in performance to SAPS II, but require less data to collect, result in more homogenous groups and are likely to be more useful for decision makers.
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data.
Ferragina, A.; de los Campos, G.; Vazquez, A. I.; Cecchinato, A.; Bittante, G.
2017-01-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict “difficult-to-predict” dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm−1 were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
NASA Astrophysics Data System (ADS)
Choi, Giehae; Bell, Michelle L.; Lee, Jong-Tae
2017-04-01
The land-use regression (LUR) approach to estimate the levels of ambient air pollutants is becoming popular due to its high validity in predicting small-area variations. However, only a few studies have been conducted in Asian countries, and much less research has been conducted on comparing the performances and applied estimates of different exposure assessments including LUR. The main objectives of the current study were to conduct nitrogen dioxide (NO2) exposure assessment with four methods including LUR in the Republic of Korea, to compare the model performances, and to estimate the empirical NO2 exposures of a cohort. The study population was defined as the year 2010 participants of a government-supported cohort established for bio-monitoring in Ulsan, Republic of Korea. The annual ambient NO2 exposures of the 969 study participants were estimated with LUR, nearest station, inverse distance weighting, and ordinary kriging. Modeling was based on the annual NO2 average, traffic-related data, land-use data, and altitude of the 13 regularly monitored stations. The final LUR model indicated that area of transportation, distance to residential area, and area of wetland were important predictors of NO2. The LUR model explained 85.8% of the variation observed in the 13 monitoring stations of the year 2009. The LUR model outperformed the others based on leave-one out cross-validation comparing the correlations and root-mean square error. All NO2 estimates ranged from 11.3-18.0 ppb, with that of LUR having the widest range. The NO2 exposure levels of the residents differed by demographics. However, the average was below the national annual guidelines of the Republic of Korea (30 ppb). The LUR models showed high performances in an industrial city in the Republic of Korea, despite the small sample size and limited data. Our findings suggest that the LUR method may be useful in similar settings in Asian countries where the target region is small and availability of data is
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
This chapter deals with the multiple linear regression. That is we investigate the situation where the mean of a variable depends linearly on a set of covariables. The noise is supposed to be gaussian. We develop the least squared method to get the parameter estimators and estimates of their precisions. This leads to design confidence intervals, prediction intervals, global tests, individual tests and more generally tests of submodels defined by linear constraints. Methods for model's choice and variables selection, measures of the quality of the fit, residuals study, diagnostic methods are presented. Finally identification of departures from the model's assumptions and the way to deal with these problems are addressed. A real data set is used to illustrate the methodology with software R. Note that this chapter is intended to serve as a guide for other regression methods, like logistic regression or AFT models and Cox regression.
Calculating a Stepwise Ridge Regression.
ERIC Educational Resources Information Center
Morris, John D.
1986-01-01
Although methods for using ordinary least squares regression computer programs to calculate a ridge regression are available, the calculation of a stepwise ridge regression requires a special purpose algorithm and computer program. The correct stepwise ridge regression procedure is given, and a parallel FORTRAN computer program is described.…
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
pPXF: Penalized Pixel-Fitting stellar kinematics extraction
NASA Astrophysics Data System (ADS)
Cappellari, Michele
2012-10-01
pPXF is an IDL (and free GDL or FL) program which extracts the stellar kinematics or stellar population from absorption-line spectra of galaxies using the Penalized Pixel-Fitting method (pPXF) developed by Cappellari & Emsellem (2004, PASP, 116, 138). Additional features implemented in the pPXF routine include: Optimal template: Fitted together with the kinematics to minimize template-mismatch errors. Also useful to extract gas kinematics or derive emission-corrected line-strengths indexes. One can use synthetic templates to study the stellar population of galaxies via "Full Spectral Fitting" instead of using traditional line-strengths.Regularization of templates weights: To reduce the noise in the recovery of the stellar population parameters and attach a physical meaning to the output weights assigned to the templates in term of the star formation history (SFH) or metallicity distribution of an individual galaxy.Iterative sigma clipping: To clean the spectra from residual bad pixels or cosmic rays.Additive/multiplicative polynomials: To correct low frequency continuum variations. Also useful for calibration purposes.
ERIC Educational Resources Information Center
Kromrey, Jeffrey D.; Hines, Constance V.
1996-01-01
The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)
ERIC Educational Resources Information Center
Perry, Thomas
2017-01-01
Value-added (VA) measures are currently the predominant approach used to compare the effectiveness of schools. Recent educational effectiveness research, however, has developed alternative approaches including the regression discontinuity (RD) design, which also allows estimation of absolute school effects. Initial research suggests RD is a viable…
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
ERIC Educational Resources Information Center
Perry, Thomas
2017-01-01
Value-added (VA) measures are currently the predominant approach used to compare the effectiveness of schools. Recent educational effectiveness research, however, has developed alternative approaches including the regression discontinuity (RD) design, which also allows estimation of absolute school effects. Initial research suggests RD is a viable…
ERIC Educational Resources Information Center
Hick, Thomas L.; Irvine, David J.
To eliminate maturation as a factor in the pretest-posttest design, pretest scores can be converted to anticipate posttest scores using grade equivalent scores from standardized tests. This conversion, known as historical regression, assumes that without specific intervention, growth will continue at the rate (grade equivalents per year of…
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2017-07-26
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R(2)), using R(2) as the primary metric of assay agreement. However, the use of R(2) alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes.
NASA Astrophysics Data System (ADS)
Amiroh; Priaminiarti, M.; Syahraini, S. I.
2017-08-01
Age estimation of individuals, both dead and living, is important for victim identification and legal certainty. The Demirjian method uses the third molar for age estimation of individuals above 15 years old. The aim is to compare age estimation between 15-25 years using two Demirjian methods. Development stage of third molars in panoramic radiographs of 50 male and female samples were assessed by two observers using Demirjian’s ten stages and two teeth regression formula. Reliability was calculated using Cohen’s kappa coefficient and the significance of the observations was obtained from Wilcoxon tests. Deviations of age estimation were calculated using various methods. The deviation of age estimation with the two teeth regression formula was ±1.090 years; with ten stages, it was ±1.191 years. The deviation of age estimation using the two teeth regression formula was less than with the ten stages method. The age estimations using the two teeth regression formula or the ten stages method are significantly different until the age of 25, but they can be applied up to the age of 22.
NASA Astrophysics Data System (ADS)
Barzin, Razieh; Shirvani, Amin; Lotfi, Hossein
2017-01-01
Downward shortwave radiation is a key quantity in the land-atmosphere interaction. Since the moderate resolution imaging spectroradiometer data has a coarse temporal resolution, which is not suitable for estimating daily average radiation, many efforts have been undertaken to estimate instantaneous solar radiation using moderate resolution imaging spectroradiometer data. In this study, the principal components analysis technique was applied to capture the information of moderate resolution imaging spectroradiometer bands, extraterrestrial radiation, aerosol optical depth, and atmospheric water vapour. A regression model based on the principal components was used to estimate daily average shortwave radiation for ten synoptic stations in the Fars province, Iran, for the period 2009-2012. The Durbin-Watson statistic and autocorrelation function of the residuals of the fitted principal components regression model indicated that the residuals were serially independent. The results indicated that the fitted principal components regression models accounted for about 86-96% of total variance of the observed shortwave radiation values and the root mean square error was about 0.9-2.04 MJ m-2 d-1. Also, the results indicated that the model accuracy decreased as the aerosol optical depth increased and extraterrestrial radiation was the most important predictor variable among all.
Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models.
Michelot, Théo; Langrock, Roland; Kneib, Thomas; King, Ruth
2016-01-01
We discuss the semiparametric modeling of mark-recapture-recovery data where the temporal and/or individual variation of model parameters is explained via covariates. Typically, in such analyses a fixed (or mixed) effects parametric model is specified for the relationship between the model parameters and the covariates of interest. In this paper, we discuss the modeling of the relationship via the use of penalized splines, to allow for considerably more flexible functional forms. Corresponding models can be fitted via numerical maximum penalized likelihood estimation, employing cross-validation to choose the smoothing parameters in a data-driven way. Our contribution builds on and extends the existing literature, providing a unified inferential framework for semiparametric mark-recapture-recovery models for open populations, where the interest typically lies in the estimation of survival probabilities. The approach is applied to two real datasets, corresponding to gray herons (Ardea cinerea), where we model the survival probability as a function of environmental condition (a time-varying global covariate), and Soay sheep (Ovis aries), where we model the survival probability as a function of individual weight (a time-varying individual-specific covariate). The proposed semiparametric approach is compared to a standard parametric (logistic) regression and new interesting underlying dynamics are observed in both cases.
Petersen, Laura A.; Woodard, LeChauncy D.; Henderson, Louise M.; Urech, Tracy H.; Pietz, Kenneth
2009-01-01
Background There is concern that performance measures, patient ratings of their care, and pay-for-performance programs may penalize health care providers of patients with multiple chronic co-existing conditions. We examined the impact of co-existing conditions on the quality of care for hypertension and patient perception of overall quality of their health care. Methods and Results 141,609 veterans with hypertension were classified into 4 condition groups: those with hypertension-concordant (diabetes, ischemic heart disease, dyslipidemia) and/or discordant (arthritis, depression, chronic obstructive pulmonary disease) conditions, or neither. We measured blood pressure control at the index visit, overall good quality of care for hypertension including a follow-up interval, and patient ratings of satisfaction with their care. Association between condition type and number of co-existing conditions on receipt of overall good quality of care were assessed using logistic regression. Relationship between patient assessment and objective measures of quality was assessed. Of the cohort, 49.5% had concordant-only comorbidities, 8.7% had discordant-only comorbidities, 25.9% had both, and 16.0% had none. Odds of receiving overall good quality after adjusting for age were higher for those with concordant comorbidities (odds ratio [OR], 1.78; 95% confidence interval [CI], 1.70–1.87), discordant comorbidities (OR, 1.32; 95% CI 1.23–1.41) or both (OR, 2.25; 95% CI, 2.13–2.38), compared with neither. Findings did not change after adjusting for illness severity and/or number of primary care and specialty care visits. Patient assessment of quality did not vary by the presence of coexisting conditions and was not related to objective ratings of quality of care. Conclusions Contrary to expectation, patients with greater complexity had higher odds of receiving high quality care for hypertension. Subjective ratings of care did not vary with the presence or absence of the comorbid
Wahid, Abdul; Khan, Dost Muhammad; Hussain, Ijaz
2017-01-01
High dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. To address this issue different penalized regression procedures have been introduced in the litrature, but these methods cannot cope with the problem of outliers and leverage points in the heavy tailed high dimensional data. For this purppose, a new Robust Adaptive Lasso (RAL) method is proposed which is based on pearson residuals weighting scheme. The weight function determines the compatibility of each observations and downweight it if they are inconsistent with the assumed model. It is observed that RAL estimator can correctly select the covariates with non-zero coefficients and can estimate parameters, simultaneously, not only in the presence of influential observations, but also in the presence of high multicolliearity. We also discuss the model selection oracle property and the asymptotic normality of the RAL. Simulations findings and real data examples also demonstrate the better performance of the proposed penalized regression approach.
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method's prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student's t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method was
Antropov, K M; Varaksin, A N
2013-01-01
This paper provides the description of Land Use Regression (LUR) modeling and the result of its application in the study of nitrogen dioxide air pollution in Ekaterinburg. The paper describes the difficulties of the modeling for air pollution caused by motor vehicles exhaust, and the ways to address these challenges. To create LUR model of the NO2 air pollution in Ekaterinburg, concentrations of NO2 were measured, data on factors affecting air pollution were collected, a statistical analysis of the data were held. A statistical model of NO2 air pollution (coefficient of determination R2 = 0.70) and a map of pollution were created.
LINKING LUNG AIRWAY STRUCTURE TO PULMONARY FUNCTION VIA COMPOSITE BRIDGE REGRESSION
Chen, Kun; Hoffman, Eric A.; Seetharaman, Indu; Jiao, Feiran; Lin, Ching-Long; Chan, Kung-Sik
2017-01-01
The human lung airway is a complex inverted tree-like structure. Detailed airway measurements can be extracted from MDCT-scanned lung images, such as segmental wall thickness, airway diameter, parent-child branch angles, etc. The wealth of lung airway data provides a unique opportunity for advancing our understanding of the fundamental structure-function relationships within the lung. An important problem is to construct and identify important lung airway features in normal subjects and connect these to standardized pulmonary function test results such as FEV1%. Among other things, the problem is complicated by the fact that a particular airway feature may be an important (relevant) predictor only when it pertains to segments of certain generations. Thus, the key is an efficient, consistent method for simultaneously conducting group selection (lung airway feature types) and within-group variable selection (airway generations), i.e., bi-level selection. Here we streamline a comprehensive procedure to process the lung airway data via imputation, normalization, transformation and groupwise principal component analysis, and then adopt a new composite penalized regression approach for conducting bi-level feature selection. As a prototype of composite penalization, the proposed composite bridge regression method is shown to admit an efficient algorithm, enjoy bi-level oracle properties, and outperform several existing methods. We analyze the MDCT lung image data from a cohort of 132 subjects with normal lung function. Our results show that, lung function in terms of FEV1% is promoted by having a less dense and more homogeneous lung comprising an airway whose segments enjoy more heterogeneity in wall thicknesses, larger mean diameters, lumen areas and branch angles. These data hold the potential of defining more accurately the “normal” subject population with borderline atypical lung functions that are clearly influenced by many genetic and environmental factors. PMID
NASA Astrophysics Data System (ADS)
Kerdiles, Herve; Dong, Qinghan; Spyratos, Sphyridon; Gallego, Javier
2013-01-01
Image classifications including sub pixel analysis are often used to estimate directly the crop acreage, while ground data collected during field surveys play a secondary role. This pixel counting approach often leads to a biased estimation due to non-representative selection of ground data and subjective a-priori knowledge of analysts. Instead regression estimator approach combining remote sensing information with a rigorous ground sampling can result in an accurate assessment of crop acreage. In this study to estimate the maize area, the point frame sampling approach is adapted to the strip-like cropping pattern on the North China Plain. Remote sensing information is used to perform a cost-efficient stratification from which no-agricultural areas are excluded from ground survey. This information is also included in a later stage as an auxiliary estimator in regression analysis. The results showed that the integration of remote sensing information as an auxiliary estimator can improve the confidence of estimation by reducing the variance of the estimates.
Clegg, Samuel M; Barefield, James E; Wiens, Roger C; Dyar, Melinda D; Schafer, Martha W; Tucker, Jonathan M
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2011-04-01 2011-04-01 false Penal sum of bond. 25.93 Section 25.93 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU, DEPARTMENT...), effective Feb. 22, 2011 to Feb. 24, 2014. ...
Education--Penal Institutions: U. S. and Europe.
ERIC Educational Resources Information Center
Kerle, Ken
Penal systems of European countries vary in educational programs and humanizing efforts. A high percentage of Soviet prisoners, many incarcerated for ideological/religious beliefs, are confined to labor colonies. All inmates are obligated to learn a trade, one of the qualifications for release being evidence of some trade skill. Swedish…
Indian NGO challenges penal code prohibition of "unnatural offences".
Csete, Joanne
2002-07-01
On 7 December 2001, the Naz Foundation (India) Trust (NFIT), a non-governmental organization based in New Delhi, filed a petition in the Delhi High Court to repeal the "unnatural offences" section of the Indian Penal Code that criminalizes men who have sex with men.
[Arterial hypertension in females engaged into penal system work].
Tagirova, M M; El'garov, A A; Shogenova, A B; Murtazov, A M
2010-01-01
The authors proved significant prevalence of arterial hypertension and atherosclerosis risk factors in women engaged into penal system work--so these values form cardiovascular risk caused by environmental parameters. Teveten and Nebilet were proved effective in the examinees with arterial hypertension.
Crime and Punishment: Are Copyright Violators Ever Penalized?
ERIC Educational Resources Information Center
Russell, Carrie
2004-01-01
Is there a Web site that keeps track of copyright Infringers and fines? Some colleagues don't believe that copyright violators are ever penalized. This question was asked by a reader in a question and answer column of "School Library Journal". Carrie Russell is the American Library Association's copyright specialist, and she will answer selected…
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2012 CFR
2012-04-01
... OF THE TREASURY LIQUORS BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar year during the period of the bond on beer: (i) Removed for transfer to the brewery from other breweries...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2014 CFR
2014-04-01
... tax at the rates prescribed by law, on the maximum quantity of beer used in the production of... OF THE TREASURY ALCOHOL BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2013 CFR
2013-04-01
... tax at the rates prescribed by law, on the maximum quantity of beer used in the production of... OF THE TREASURY ALCOHOL BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar...
Crime and Punishment: Are Copyright Violators Ever Penalized?
ERIC Educational Resources Information Center
Russell, Carrie
2004-01-01
Is there a Web site that keeps track of copyright Infringers and fines? Some colleagues don't believe that copyright violators are ever penalized. This question was asked by a reader in a question and answer column of "School Library Journal". Carrie Russell is the American Library Association's copyright specialist, and she will answer selected…
Lee, Mi Hee; Lee, Soo Bong; Eo, Yang Dam; Kim, Sun Woong; Woo, Jung-Hun; Han, Soo Hee
2017-07-01
Landsat optical images have enough spatial and spectral resolution to analyze vegetation growth characteristics. But, the clouds and water vapor degrade the image quality quite often, which limits the availability of usable images for the time series vegetation vitality measurement. To overcome this shortcoming, simulated images are used as an alternative. In this study, weighted average method, spatial and temporal adaptive reflectance fusion model (STARFM) method, and multilinear regression analysis method have been tested to produce simulated Landsat normalized difference vegetation index (NDVI) images of the Korean Peninsula. The test results showed that the weighted average method produced the images most similar to the actual images, provided that the images were available within 1 month before and after the target date. The STARFM method gives good results when the input image date is close to the target date. Careful regional and seasonal consideration is required in selecting input images. During summer season, due to clouds, it is very difficult to get the images close enough to the target date. Multilinear regression analysis gives meaningful results even when the input image date is not so close to the target date. Average R (2) values for weighted average method, STARFM, and multilinear regression analysis were 0.741, 0.70, and 0.61, respectively.
NASA Astrophysics Data System (ADS)
Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah
2017-08-01
In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Raevsky, O A; Polianczyk, D E; Mukhametov, A; Grigorev, V Y
2016-08-01
Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs.
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Robust Nuclear Norm-based Matrix Regression with Applications to Robust Face Recognition.
Xie, Jianchun; Yang, Jian; Qian, Jianjun; Tai, Ying; Zhang, Hengmin
2017-02-01
Face recognition (FR) via regression analysis based classification has been widely studied in the past several years. Most existing regression analysis methods characterize the pixelwise representation error via l1-norm or l2-norm, which overlook the two-dimensional structure of the error image. Recently, the nuclear norm based matrix regression (NMR) model is proposed to characterize low-rank structure of the error image. However, the nuclear norm cannot accurately describe the lowrank structural noise when the incoherence assumptions on the singular values does not hold, since it over-penalizes several much larger singular values. To address this problem, this paper presents the robust nuclear norm to characterize the structural error image and then extends it to deal with the mixed noise. The majorization-minimization (MM) method is applied to derive a iterative scheme for minimization of the robust nuclear norm optimization problem. Then, an efficiently alternating direction method of multipliers (ADMM) method is used to solve the proposed models. We use weighted nuclear norm as classification criterion to obtain the final recognition results. Experiments on several public face databases demonstrate the effectiveness of our models in handling with variations of structural noise (occlusion, illumination, etc.) and mixed noise.
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Regularized Quantile Regression and Robust Feature Screening for Single Index Models
Zhong, Wei; Zhu, Liping; Li, Runze; Cui, Hengjian
2015-01-01
We propose both a penalized quantile regression and an independence screening procedure to identify important covariates and to exclude unimportant ones for a general class of ultrahigh dimensional single-index models, in which the conditional distribution of the response depends on the covariates via a single-index structure. We observe that the linear quantile regression yields a consistent estimator of the direction of the index parameter in the single-index model. Such an observation dramatically reduces computational complexity in selecting important covariates in the single-index model. We establish an oracle property for the penalized quantile regression estimator when the covariate dimension increases at an exponential rate of the sample size. From a practical perspective, however, when the covariate dimension is extremely large, the penalized quantile regression may suffer from at least two drawbacks: computational expediency and algorithmic stability. To address these issues, we propose an independence screening procedure which is robust to model misspecification, and has reliable performance when the distribution of the response variable is heavily tailed or response realizations contain extreme values. The new independence screening procedure offers a useful complement to the penalized quantile regression since it helps to reduce the covariate dimension from ultrahigh dimensionality to a moderate scale. Based on the reduced model, the penalized linear quantile regression further refines selection of important covariates at different quantile levels. We examine the finite sample performance of the newly proposed procedure by Monte Carlo simulations and demonstrate the proposed methodology by an empirical analysis of a real data set. PMID:26941542
Jansat, J M; Lastra, C F; Mariño, E L
1998-06-01
The influence of different weighting methods in non-linear regression analysis was evaluated in the pharmacokinetics of carebastine after a single intravenous dose of 10 mg in 8 healthy volunteers. Plasma concentrations were measured by HPLC using an on-line solid-phase extraction method and automated injection. The analytical method was fully validated and the function of the analytical error subsequently determined. The parametric approach was performed using different weighting methods, including the homoscedastic method (W = 1) and heteroscedastic methods using weights of 1/C, 1/C2, and the inverse of the concentration variance calculated through the analytical error function (1/V), and the results were statistically evaluated according to the normal distribution. Statistically significant differences were observed in the representative parameters of the disposition kinetics of carebastine. The use of a multiple comparison test for statistical analysis of all differences among group means indicated that differences were generated between the homoscedastic method (W = 1) and the heteroscedastic methods (1/C, 1/C2, and 1/V). The results obtained in the present study confirmed the utility of the analytical error function as a weighting method in non-linear regression analysis and reinforced the importance of the correct choice of weights to avoid the estimation of imprecise or erroneous pharmacokinetic parameters.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection
Bradic, Jelena; Fan, Jianqing; Wang, Weiwei
2011-01-01
Summary In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples. PMID:21589849
NASA Astrophysics Data System (ADS)
Ren, Xue; Lee, Soo-Jin
2016-03-01
Patch-based regularization methods, which have proven useful not only for image denoising, but also for tomographic reconstruction, penalize image roughness based on the intensity differences between two nearby patches. However, when two patches are not considered to be similar in the general sense of similarity but still have similar features in a scaled domain after normalizing the two patches, the difference between the two patches in the scaled domain is smaller than the intensity difference measured in the standard method. Standard patch-based methods tend to ignore such similarities due to the large intensity differences between the two patches. In this work, for patch-based penalized likelihood tomographic reconstruction, we propose a new approach to the similarity measure using the normalized patch differences as well as the intensity-based patch differences. A normalized patch difference is obtained by normalizing and scaling the intensity-based patch difference. To selectively take advantage of the standard patch (SP) and normalized patch (NP), we use switching schemes that can select either SP or NP based on the gradient of a reconstructed image. In this case the SP is selected for restoring large-scaled piecewise-smooth regions, while the NP is selected for preserving the contrast of fine details. The numerical experiments using software phantom demonstrate that our proposed methods not only improve overall reconstruction accuracy in terms of the percentage error, but also reveal better recovery of fine details in terms of the contrast recovery coefficient.
Zhang, Yan-Feng; Zhang, Li; Gao, Zhi-Xian; Dai, Shu-Gui
2012-01-01
Polycyclic aromatic hydrocarbons (PAHs) are ubiquitous contaminants found in the environment. Immunoassays represent useful analytical methods to complement traditional analytical procedures for PAHs. Cross-reactivity (CR) is a very useful character to evaluate the extent of cross-reaction of a cross-reactant in immunoreactions and immunoassays. The quantitative relationships between the molecular properties and the CR of PAHs were established by stepwise multiple linear regression, principal component regression and partial least square regression, using the data of two commercial enzyme-linked immunosorbent assay (ELISA) kits. The objective is to find the most important molecular properties that affect the CR, and predict the CR by multiple regression methods. The results show that the physicochemical, electronic and topological properties of the PAH molecules have an integrated effect on the CR properties for the two ELISAs, among which molar solubility (S(m)) and valence molecular connectivity index ((3)χ(v)) are the most important factors. The obtained regression equations for Ris(C) kit are all statistically significant (p < 0.005) and show satisfactory ability for predicting CR values, while equations for RaPID kit are all not significant (p > 0.05) and not suitable for predicting. It is probably because that the Ris(C) immunoassay employs a monoclonal antibody, while the RaPID kit is based on polyclonal antibody. Considering the important effect of solubility on the CR values, cross-reaction potential (CRP) is calculated and used as a complement of CR for evaluation of cross-reactions in immunoassays. Only the compounds with both high CR and high CRP can cause intense cross-reactions in immunoassays.
Boy-Roura, M; Cameron, K C; Di, H J
2016-02-01
This study presents a meta-analysis of 12 experiments that quantify nitrate-N leaching losses from grazed pasture systems in alluvial sedimentary soils in Canterbury (New Zealand). Mean measured nitrate-N leached (kg N/ha × 100 mm drainage) losses were 2.7 when no urine was applied, 8.4 at the urine rate of 300 kg N/ha, 9.8 at 500 kg N/ha, 24.5 at 700 kg N/ha and 51.4 at 1000 kg N/ha. Lismore soils presented significantly higher nitrate-N losses compared to Templeton soils. Moreover, a multiple linear regression (MLR) model was developed to determine the key factors that influence nitrate-N leaching and to predict nitrate-N leaching losses. The MLR analyses was calibrated and validated using 82 average values of nitrate-N leached and 48 explanatory variables representative of nitrogen inputs and outputs, transport, attenuation of nitrogen and farm management practices. The MLR model (R (2) = 0.81) showed that nitrate-N leaching losses were greater at higher urine application rates and when there was more drainage from rainfall and irrigation. On the other hand, nitrate leaching decreased when nitrification inhibitors (e.g. dicyandiamide (DCD)) were applied. Predicted nitrate-N leaching losses at the paddock scale were calculated using the MLR equation, and they varied largely depending on the urine application rate and urine patch coverage.
NASA Astrophysics Data System (ADS)
Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika
2015-12-01
Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.
NASA Astrophysics Data System (ADS)
Huang, Cong; Liu, Dan-Dan; Wang, Jing-Song
2009-06-01
The 10.7 cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7 cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network.
Gustavsson, Sara; Fagerberg, Björn; Sallsten, Gerd; Andersson, Eva M.
2014-01-01
We compared six methods for regression on log-normal heteroscedastic data with respect to the estimated associations with explanatory factors (bias and standard error) and the estimated expected outcome (bias and confidence interval). Method comparisons were based on results from a simulation study, and also the estimation of the association between abdominal adiposity and two biomarkers; C-Reactive Protein (CRP) (inflammation marker,) and Insulin Resistance (HOMA-IR) (marker of insulin resistance). Five of the methods provide unbiased estimates of the associations and the expected outcome; two of them provide confidence intervals with correct coverage. PMID:24681553
Liu, Song; Su, Bo-min; Li, Qing-hui; Gan, Fu-xi
2015-01-01
The authors tried to find a method for quantitative analysis using pXRF without solid bulk stone/jade reference samples. 24 nephrite samples were selected, 17 samples were calibration samples and the other 7 are test samples. All the nephrite samples were analyzed by Proton induced X-ray emission spectroscopy (PIXE) quantitatively. Based on the PIXE results of calibration samples, calibration curves were created for the interested components/elements and used to analyze the test samples quantitatively; then, the qualitative spectrum of all nephrite samples were obtained by pXRF. According to the PIXE results and qualitative spectrum of calibration samples, partial least square method (PLS) was used for quantitative analysis of test samples. Finally, the results of test samples obtained by calibration method, PLS method and PIXE were compared to each other. The accuracy of calibration curve method and PLS method was estimated. The result indicates that the PLS method is the alternate method for quantitative analysis of stone/jade samples.
Prediction accuracy and variable selection for penalized cause-specific hazards models.
Saadati, Maral; Beyersmann, Jan; Kopp-Schneider, Annette; Benner, Axel
2017-08-01
We consider modeling competing risks data in high dimensions using a penalized cause-specific hazards (CSHs) approach. CSHs have conceptual advantages that are useful for analyzing molecular data. First, working on hazards level can further understanding of the underlying biological mechanisms that drive transition hazards. Second, CSH models can be used to extend the multistate framework for high-dimensional data. The CSH approach is implemented by fitting separate proportional hazards models for each event type (iCS). In the high-dimensional setting, this might seem too complex and possibly prone to overfitting. Therefore, we consider an extension, namely "linking" the separate models by choosing penalty tuning parameters that in combination yield best prediction of the incidence of the event of interest (penCR). We investigate whether this extension is useful with respect to prediction accuracy and variable selection. The two approaches are compared to the subdistribution hazards (SDH) model, which is an established method that naturally achieves "linking" by working on incidence level, but loses interpretability of the covariate effects. Our simulation studies indicate that in many aspects, iCS is competitive to penCR and the SDH approach. There are some instances that speak in favor of linking the CSH models, for example, in the presence of opposing effects on the CSHs. We conclude that penalized CSH models are a viable solution for competing risks models in high dimensions. Linking the CSHs can be useful in some particular cases; however, simple models using separately penalized CSH are often justified. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai
2016-08-26
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.
Jackson, Dan; White, Ian R; Riley, Richard D
2013-03-01
Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Jackson, Dan; White, Ian R; Riley, Richard D
2013-01-01
Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213
Linear regression in astronomy. I
NASA Technical Reports Server (NTRS)
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
NASA Astrophysics Data System (ADS)
Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif
2017-09-01
In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.
Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim
2015-01-01
Background: Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. Materials and Methods: A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Results: Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Conclusion: Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts. PMID:26500410
Pereira, L F P; Adeola, O
2016-09-01
The energy and phosphorus values of sunflower meal (SFM) and rice bran (RB) were determined in 2 experiments with Ross 708 broiler chickens from 15 to 22 d of age. In Exp.1, the diets consisted of a corn-soybean meal reference diet (RD) and 4 test diets (TD). The TD consisted of SFM and RB that partly replaced the energy sources in the RD at 100 or 200 g/kg and 75 or 150 g/kg, respectively, such that the equal ratios were maintained for all energy containing ingredients across all experimental diets. In Exp.2, a cornstarch-soybean meal diet was the RD and TD consisting of SFM and RB that partly replaced cornstarch in the RD at 100 or 200 g/kg and 60 or 120 g/kg, respectively. Addition of SFM and RB to the RD in Exp.1 linearly decreased (P < 0.01) the digestibility coefficients of DM, energy, ileal digestible energy (IDE), metabolizability coefficients of DM, nitrogen (N), energy, N correct energy, metabolize energy (ME), and nitrogen-corrected ME. Except for RB, the increased levels of the test ingredients in RD did affect the metabolizability coefficients of N. The IDE values (kcal/kg DM) were 1,953 for SFM and 2,498 for RB; ME values (kcal/kg DM) were 1,893 for SFM and 2,683 for RB; and MEn values (kcal/kg DM) were 1,614 for SFM and 2,476 for RB. In Exp.2, there was a linear relationship between phosphorus (P) intake and ileal P output for diets with increased levels of SFM and RB. In addition, there was a linear relationship between P intake and P digestibility and retention for diets with increased levels of SFM. There were a quadratic effect (P < 0.01) and a tendency of quadratic effect (P = 0.07) for P digestible and total tract P retained, respectively, in the RB diets. The P digestibility and total tract P retention from regression analyses for SFM were 46% and 38%, respectively. © 2016 Poultry Science Association Inc.
Briggs, D J; de Hoogh, C; Gulliver, J; Wills, J; Elliott, P; Kingham, S; Smallbone, K
2000-05-15
Accurate, high-resolution maps of traffic-related air pollution are needed both as a basis for assessing exposures as part of epidemiological studies, and to inform urban air-quality policy and traffic management. This paper assesses the use of a GIS-based, regression mapping technique to model spatial patterns of traffic-related air pollution. The model--developed using data from 80 passive sampler sites in Huddersfield, as part of the SAVIAH (Small Area Variations in Air Quality and Health) project--uses data on traffic flows and land cover in the 300-m buffer zone around each site, and altitude of the site, as predictors of NO2 concentrations. It was tested here by application in four urban areas in the UK: Huddersfield (for the year following that used for initial model development), Sheffield, Northampton, and part of London. In each case, a GIS was built in ArcInfo, integrating relevant data on road traffic, urban land use and topography. Monitoring of NO2 was undertaken using replicate passive samplers (in London, data were obtained from surveys carried out as part of the London network). In Huddersfield, Sheffield and Northampton, the model was first calibrated by comparing modelled results with monitored NO2 concentrations at 10 randomly selected sites; the calibrated model was then validated against data from a further 10-28 sites. In London, where data for only 11 sites were available, validation was not undertaken. Results showed that the model performed well in all cases. After local calibration, the model gave estimates of mean annual NO2 concentrations within a factor of 1.5 of the actual mean (approx. 70-90%) of the time and within a factor of 2 between 70 and 100% of the time. r2 values between modelled and observed concentrations are in the range of 0.58-0.76. These results are comparable to those achieved by more sophisticated dispersion models. The model also has several advantages over dispersion modelling. It is able, for example, to provide
NASA Astrophysics Data System (ADS)
Zhu, Ying; Tan, Tuck Lee
2016-04-01
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
Zhu, Ying; Tan, Tuck Lee
2016-04-15
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
Huang, Qi-ting; Zhou, Lian-qing; Shi, Zhou; Li, Zhen-yu; Gu, Qun
2009-05-01
In the present study, soil samples were scanned by NITON XLt920 field portable X-ray fluorescence (FPXRF) analyzer, and the relationship between the X-ray fluorescence spectra and the concentration of Pb in soil was studied. For predicating the Pb concentration in soil, a partial least square regression model (PLS)was established with 6 optimal factors and two closely relevant electron volt ranges: 10.40-10.70 keV and 12.41-12.80 keV. After cross-calibration, the correlation coefficient of value predicted by PLS model against that measured by ICP was 0.9666, and the root mean square error of prediction (RMSEP) was 0.8732. Meanwhile, the univariate linear regression and multivariate linear regression models were also built with the correlation coefficient of 0.6805 and 0.7302, respectively. Obviously, the PLS method was better than the other two methods for predication. Comparing to the conventional approach of atomic absorption spectroscopy (AAS), FPXRF has the advantages of rapidness, non-destruction and relatively low cost with the acceptable accuracy. It would be a powerful tool to decide which sample is needs for further analysis.
USDA-ARS?s Scientific Manuscript database
In fiber length measurement by the rapid method of testing fiber beards instead of testing individual fibers, only the fiber portion projected from the fiber clamp can be measured. The length distribution of the projecting portion is very different from that of the original sample. The Part 1 pape...
Metamorphic Geodesic Regression
Hong, Yi; Joshi, Sarang; Sanchez, Mar; Styner, Martin; Niethammer, Marc
2013-01-01
We propose a metamorphic geodesic regression approach approximating spatial transformations for image time-series while simultaneously accounting for intensity changes. Such changes occur for example in magnetic resonance imaging (MRI) studies of the developing brain due to myelination. To simplify computations we propose an approximate metamorphic geodesic regression formulation that only requires pairwise computations of image metamorphoses. The approximated solution is an appropriately weighted average of initial momenta. To obtain initial momenta reliably, we develop a shooting method for image metamorphosis. PMID:23286131
Penal managerialism from within: implications for theory and research.
Cheliotis, Leonidas K
2006-01-01
Unlike the bulk of penological scholarship dealing with managerialist reforms, this article calls for greater theoretical and research attention to the often pernicious impact of managerialism on criminal justice professionals. Much in an ideal-typical fashion, light is shed on: the reasons why contemporary penal bureaucracies endeavor systematically to strip criminal justice work of its inherently affective nature; the structural forces that ensure control over officials; the processes by which those forces come into effect; and the human consequences of submission to totalitarian bureaucratic milieus. It is suggested that the heavy preoccupation of present-day penality with the predictability and calculability of outcomes entails the atomization of professionals and the dehumanization of their work. This is achieved through a kaleidoscope of direct and indirect mechanisms that naturalize and/or legitimate acquiescence.
De la Cruz, Rolando; Fuentes, Claudio; Meza, Cristian; Lee, Dae-Jin; Arribas-Gil, Ana
2017-02-19
We propose a semiparametric nonlinear mixed-effects model (SNMM) using penalized splines to classify longitudinal data and improve the prediction of a binary outcome. The work is motivated by a study in which different hormone levels were measured during the early stages of pregnancy, and the challenge is using this information to predict normal versus abnormal pregnancy outcomes. The aim of this paper is to compare models and estimation strategies on the basis of alternative formulations of SNMMs depending on the characteristics of the data set under consideration. For our motivating example, we address the classification problem using a particular case of the SNMM in which the parameter space has a finite dimensional component (fixed effects and variance components) and an infinite dimensional component (unknown function) that need to be estimated. The nonparametric component of the model is estimated using penalized splines. For the parametric component, we compare the advantages of using random effects versus direct modeling of the correlation structure of the errors. Numerical studies show that our approach improves over other existing methods for the analysis of this type of data. Furthermore, the results obtained using our method support the idea that explicit modeling of the serial correlation of the error term improves the prediction accuracy with respect to a model with random effects, but independent errors. Copyright © 2017 John Wiley & Sons, Ltd.
Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim
2015-01-01
Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts.
Chen, Qingxia; Ibrahim, Joseph G
2014-07-01
Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.
Lakshmi, Karunanidhi Santhana; Lakshmi, Sivasubramanian
2011-03-01
Simultaneous determination of valsartan and hydrochlorothiazide by the H-point standard additions method (HPSAM) and partial least squares (PLS) calibration is described. Absorbances at a pair of wavelengths, 216 and 228 nm, were monitored with the addition of standard solutions of valsartan. Results of applying HPSAM showed that valsartan and hydrochlorothiazide can be determined simultaneously at concentration ratios varying from 20:1 to 1:15 in a mixed sample. The proposed PLS method does not require chemical separation and spectral graphical procedures for quantitative resolution of mixtures containing the titled compounds. The calibration model was based on absorption spectra in the 200-350 nm range for 25 different mixtures of valsartan and hydrochlorothiazide. Calibration matrices contained 0.5-3 μg mL-1 of both valsartan and hydrochlorothiazide. The standard error of prediction (SEP) for valsartan and hydrochlorothiazide was 0.020 and 0.038 μg mL-1, respectively. Both proposed methods were successfully applied to the determination of valsartan and hydrochlorothiazide in several synthetic and real matrix samples.
Structured fusion lasso penalized multi-state models.
Sennhenn-Reulen, Holger; Kneib, Thomas
2016-11-10
Multi-state models generalize survival or duration time analysis to the estimation of transition-specific hazard rate functions for multiple transitions. When each of the transition-specific risk functions is parametrized with several distinct covariate effect coefficients, this leads to a model of potentially high dimension. To decrease the parameter space dimensionality and to work out a clear image of the underlying multi-state model structure, one can either aim at setting some coefficients to zero or to make coefficients for the same covariate but two different transitions equal. The first issue can be approached by penalizing the absolute values of the covariate coefficients as in lasso regularization. If, instead, absolute differences between coefficients of the same covariate on different transitions are penalized, this leads to sparse competing risk relations within a multi-state model, that is, equality of covariate effect coefficients. In this paper, a new estimation approach providing sparse multi-state modelling by the aforementioned principles is established, based on the estimation of multi-state models and a simultaneous penalization of the L1 -norm of covariate coefficients and their differences in a structured way. The new multi-state modelling approach is illustrated on peritoneal dialysis study data and implemented in the R package penMSM. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
[Guideline 'Medicinal care for drug addicts in penal institutions'].
Westra, Michel; de Haan, Hein A; Arends, Marleen T; van Everdingen, Jannes J E; Klazinga, Niek S
2009-01-01
In the Netherlands, the policy on care for prisoners who are addicted to opiates is still heterogeneous. The recent guidelines entitled 'Medicinal care for drug addicts in penal institutions' should contribute towards unambiguous and more evidence-based treatment for this group. In addition, it should improve and bring the care pathways within judicial institutions and mainstream healthcare more into line with one another. Each rational course of medicinal treatment will initially be continued in the penal institution. In penal institutions the help on offer is mainly focused on abstinence from illegal drugs while at the same time limiting the damage caused to the health of the individual user. Methadone is regarded at the first choice for maintenance therapy. For patient safety, this is best given in liquid form in sealed cups of 5 mg/ml once daily in the morning. Recently a combination preparation containing buprenorphine and naloxone - a complete opiate antagonist - has become available. On discontinuation of opiate maintenance treatment intensive follow-up care is necessary. During this period there is considerable risk of a potentially lethal overdose. Detoxification should be coupled with psychosocial or medicinal intervention aimed at preventing relapse. Naltrexone is currently the only available opiate antagonist for preventing relapse. In those addicted to opiates, who also take benzodiazepines without any indication, it is strongly recommended that these be reduced and discontinued. This can be achieved by converting the regular dosage into the equivalent in diazepam and then reducing this dosage by a maximum of 25% a week.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Sanagi, M Marsin; Ling, Susie L; Nasir, Zalilah; Hermawan, Dadan; Ibrahim, Wan Aini Wan; Abu Naim, Ahmedy
2009-01-01
LOD and LOQ are two important performance characteristics in method validation. This work compares three methods based on the International Conference on Harmonization and EURACHEM guidelines, namely, signal-to-noise, blank determination, and linear regression, to estimate the LOD and LOQ for volatile organic compounds (VOCs) by experimental methodology using GC. Five VOCs, toluene, ethylbenzene, isopropylbenzene, n-propylbenzene, and styrene, were chosen for the experimental study. The results indicated that the estimated LODs and LOQs were not equivalent and could vary by a factor of 5 to 6 for the different methods. It is, therefore, essential to have a clearly described procedure for estimating the LOD and LOQ during method validation to allow interlaboratory comparisons.
Eriksson, Lennart; Jaworska, Joanna; Worth, Andrew P; Cronin, Mark T D; McDowell, Robert M; Gramatica, Paola
2003-08-01
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation.
Eriksson, Lennart; Jaworska, Joanna; Worth, Andrew P; Cronin, Mark T D; McDowell, Robert M; Gramatica, Paola
2003-01-01
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation. PMID:12896860
Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A
2015-03-15
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
Donnelly, Aoife; Misstear, Bruce; Broderick, Brian
2011-02-15
Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization.
Wang, Guobao; Qi, Jinyi
2012-12-01
Iterative image reconstruction for positron emission tomography (PET) can improve image quality by using spatial regularization that penalizes image intensity difference between neighboring pixels. The most commonly used quadratic penalty often oversmoothes edges and fine features in reconstructed images. Nonquadratic penalties can preserve edges but often introduce piece-wise constant blocky artifacts and the results are also sensitive to the hyper-parameter that controls the shape of the penalty function. This paper presents a patch-based regularization for iterative image reconstruction that uses neighborhood patches instead of individual pixels in computing the nonquadratic penalty. The new regularization is more robust than the conventional pixel-based regularization in differentiating sharp edges from random fluctuations due to noise. An optimization transfer algorithm is developed for the penalized maximum likelihood estimation. Each iteration of the algorithm can be implemented in three simple steps: an EM-like image update, an image smoothing and a pixel-by-pixel image fusion. Computer simulations show that the proposed patch-based regularization can achieve higher contrast recovery for small objects without increasing background variation compared with the quadratic regularization. The reconstruction is also more robust to the hyper-parameter than conventional pixel-based nonquadratic regularizations. The proposed regularization method has been applied to real 3-D PET data.
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-01-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc.). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration
NASA Astrophysics Data System (ADS)
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-09-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration, and
Hegazy, Maha A; Lotfy, Hayam M; Rezk, Mamdouh R; Omran, Yasmin Rostom
2015-04-05
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
[Qualification of persons taking part in psychiatric opinion-giving in a penal trial].
Zgryzek, K
1998-01-01
Introduction of new Penal code by the Parliament brings about the necessity of conducting a detailed analysis of particular legal solutions in the code. The authors present an analysis of selected issues included in the Penal Code, referring to proof from the opinion of psychiatric experts, particularly those regarding professional qualifications of persons appointed by the court in a penal trial to assess mental health state of definite persons (a witness, a victim, the perpetrator). It was accepted that the only persons authorized the conduct psychiatric examination in a penal trial are those with at least first degree specialization in psychiatry.
NASA Astrophysics Data System (ADS)
Setiawan, Dedy K.; Anggraeni, Rina
2017-03-01
For optimization of electricity production, especially in the thermal power plant required analysis of input-output characteristics and operated optimally. Input-output characteristics will oversee the curve and detected the plant need for maintenance or not. Input-output characteristics can be calculated by quadratic least squares regression method. Operated load properly, making electricity production corresponding maximum desired load with lowest cost. Load calculations performed by dynamic genetic algorithm method. This method is applied to data from PT. Pembangkit Jawa Bali (PJB) Unit Pembangkit Gresik in July 2015 has saving 3.162,9147 kNm3 (kilo Normal cubic meters) fuel consumption and 22,773 fuel costs compared PJB. While applied to data December 2012 has saving 16.532,2189 liters fuel consumption and 84,654.79 fuel costs compared PT. PJB.
Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif
2017-09-05
In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1ppm to 100ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation. Copyright © 2017 Elsevier B.V. All rights reserved.
Wang, Dong-Qin; Gao, Ying-Lian; Liu, Jin-Xing; Zheng, Chun-Hou; Kong, Xiang-Zhen
2017-07-18
The traditional methods of drug discovery follow the "one drug-one target" approach, which ignores the cellular and physiological environment of the action mechanism of drugs. However, pathway-based drug discovery methods can overcome this limitation. This kind of method, such as the Integrative Penalized Matrix Decomposition (iPaD) method, identifies the drug-pathway associations by taking the lasso-type penalty on the regularization term. Moreover, instead of imposing the L1-norm regularization, the L2,1-Integrative Penalized Matrix Decomposition (L2,1-iPaD) method imposes the L2,1-norm penalty on the regularization term. In this paper, based on the iPaD and L2,1-iPaD methods, we propose a novel method named L1L2,1-iPaD (L1L2,1-Integrative Penalized Matrix Decomposition), which takes the sum of the L1-norm and L2,1-norm penalties on the regularization term. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs and compute the P-values. Compared with the existing methods, our method can identify more drug-pathway association pairs which have been validated in the CancerResource database. In order to identify drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove these associations. The results on two real datasets prove that our method can achieve better enrichment for identified association pairs than the iPaD and L2,1-iPaD methods.
Wang, Dong-Qin; Gao, Ying-Lian; Liu, Jin-Xing; Zheng, Chun-Hou; Kong, Xiang-Zhen
2017-01-01
The traditional methods of drug discovery follow the “one drug-one target” approach, which ignores the cellular and physiological environment of the action mechanism of drugs. However, pathway-based drug discovery methods can overcome this limitation. This kind of method, such as the Integrative Penalized Matrix Decomposition (iPaD) method, identifies the drug-pathway associations by taking the lasso-type penalty on the regularization term. Moreover, instead of imposing the L1-norm regularization, the L2,1-Integrative Penalized Matrix Decomposition (L2,1-iPaD) method imposes the L2,1-norm penalty on the regularization term. In this paper, based on the iPaD and L2,1-iPaD methods, we propose a novel method named L1L2,1-iPaD (L1L2,1-Integrative Penalized Matrix Decomposition), which takes the sum of the L1-norm and L2,1-norm penalties on the regularization term. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs and compute the P-values. Compared with the existing methods, our method can identify more drug-pathway association pairs which have been validated in the CancerResource database. In order to identify drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove these associations. The results on two real datasets prove that our method can achieve better enrichment for identified association pairs than the iPaD and L2,1-iPaD methods. PMID:28624800
Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets.
Heinze, Georg; Puhr, Rainer
2010-03-30
Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at: http://www.muw.ac.at/msi/biometrie/programs.
Nonlinear Regression Methods for Estimation
2005-09-01
accuracy when the geometric dilution of precision ( GDOP ) causes collinearity, which in turn brings about poor position estimates. The main goal is...measurements are needed to wash-out the 168 measurement noise. Furthermore, the measurement arrangement’s geometry ( GDOP ) strongly impacts the achievable...Newton algorithm, 61 geometric dilution of precision, see GDOP initial parameter estimate, 91 iterative least squares, see ILS Kalman filtering, 10
Splines for diffeomorphic image regression.
Singh, Nikhil; Niethammer, Marc
2014-01-01
This paper develops a method for splines on diffeomorphisms for image regression. In contrast to previously proposed methods to capture image changes over time, such as geodesic regression, the method can capture more complex spatio-temporal deformations. In particular, it is a first step towards capturing periodic motions for example of the heart or the lung. Starting from a variational formulation of splines the proposed approach allows for the use of temporal control points to control spline behavior. This necessitates the development of a shooting formulation for splines. Experimental results are shown for synthetic and real data. The performance of the method is compared to geodesic regression.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
The geology of the Penal/Barrackpore field, onshore Trinidad
Dyer, B.L. )
1991-03-01
The Penal/Barrackpore field was discovered in 1938 and is located in the southern subbasin of onshore Trinidad. It is one of a series of northeast-southwest trending en echelon middle Miocene anticlinal structures that was later accentuated by late Pliocene transpressional folding. The middle Miocene Herrera and Karamat turbiditic sandstones are the primary reservoir rock in the subsurface anticline of the Penal/Barrackpore field. These turbidites were sourced from the north and deposited within the marls and clays of the Cipero Formation. The Karamat sandstones are followed in vertical stratigraphic succession by the shales and boulder beds of the Lengua formation, the turbidites and deltaics of the lower and middle Cruse, and the deltaics of the upper Cruse, the Forest, and the Morne L'Enfer formations. Relative movement of the South American and Caribbean plates climaxed in the middle Miocene compressive tectonic event and produced an imbricate pattern of southward-facing basement-involved thrusts. The Pliocene deltaics were sourced by erosion of Miocene highs to the north and the South American landmass to the south. These deltaics exhibit onlap onto the preexisting Miocene highs. The late Pliocene transpression also coincides with the onset of oil migration along faults, diapirs, and unconformities from the Cretaceous Naparima Hill source. The Lengua Formation and the upper Forest clays are considered effect seals. Hydrocarbon trapping is structurally and stratigraphically controlled, with structure being the dominant trapping mechanism. Ultimate recoverable reserves for the Penal/Barrackpore field are estimated at 127.9 MMBO and 628.8 bcf. The field is presently owned and operated by the Trinidad and Tobago Oil Company Limited (TRINTOC).
$L^1$ penalization of volumetric dose objectives in optimal control of PDEs
Barnard, Richard C.; Clason, Christian
2017-02-11
This work is concerned with a class of PDE-constrained optimization problems that are motivated by an application in radiotherapy treatment planning. Here the primary design objective is to minimize the volume where a functional of the state violates a prescribed level, but prescribing these levels in the form of pointwise state constraints leads to infeasible problems. We therefore propose an alternative approach based on L1 penalization of the violation that is also applicable when state constraints are infeasible. We establish well-posedness of the corresponding optimal control problem, derive first-order optimality conditions, discuss convergence of minimizers as the penalty parameter tendsmore » to infinity, and present a semismooth Newton method for their efficient numerical solution. Finally, the performance of this method for a model problem is illustrated and contrasted with an alternative approach based on (regularized) state constraints.« less
L1 penalization of volumetric dose objectives in optimal control of PDEs
Barnard, Richard C.; Clason, Christian
2017-02-11
This work is concerned with a class of PDE-constrained optimization problems that are motivated by an application in radiotherapy treatment planning. Here the primary design objective is to minimize the volume where a functional of the state violates a prescribed level, but prescribing these levels in the form of pointwise state constraints leads to infeasible problems. We therefore propose an alternative approach based on L1 penalization of the violation that is also applicable when state constraints are infeasible. We establish well-posedness of the corresponding optimal control problem, derive first-order optimality conditions, discuss convergence of minimizers as the penalty parameter tendsmore » to infinity, and present a semismooth Newton method for their efficient numerical solution. Finally, the performance of this method for a model problem is illustrated and contrasted with an alternative approach based on (regularized) state constraints.« less
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
The neoliberal state and the penalization of misery.
Jinkings, Isabella
2011-01-01
The strategy adopted by the neoliberal state to maintain social order and safeguard private property in a context of economic deregulation and social precariousness has destroyed the welfare state and aggravated poverty, depriving the masses of any form of social protection while subjecting them to repression. The reinforcement of the repressive state apparatus is associated with the social instability provoked by the lack of social policies, the degradation of living conditions for the great majority of the population, and the amplification of income and property inequalities both in the so-called capitalist periphery and in the richest industrialized countries. The penalization of misery is revealed as a new expression of class domination.
Haplotype Estimation from Fuzzy Genotypes Using Penalized Likelihood
Uh, Hae-Won; Eilers, Paul H. C.
2011-01-01
The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain (“fuzzy”) genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores. PMID:21931662
Haplotype estimation from fuzzy genotypes using penalized likelihood.
Uh, Hae-Won; Eilers, Paul H C
2011-01-01
The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain ("fuzzy") genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores.
Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel
2016-10-01
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.
ERIC Educational Resources Information Center
Walton, Joseph M.; And Others
1978-01-01
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
Variable Selection in Semiparametric Regression Modeling.
Li, Runze; Liang, Hua
2008-01-01
In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and select significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each submodel. This leads to very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. The newly proposed procedures are distinguished from the traditional ones in that they delete insignificant variables and estimate the coefficients of significant variables simultaneously. This allows us to establish the sampling properties of the resulting estimate. We first establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we then establish the asymptotic normality of the resulting estimate, and further demonstrate that the proposed procedures perform as well as an oracle procedure. Semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Penal provisions under the Taylor Grazing Act. 4170.2-1 Section 4170.2-1 Public Lands: Interior Regulations Relating to Public Lands (Continued...-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2 of...
49 CFR 26.47 - Can recipients be penalized for failing to meet overall goals?
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 1 2013-10-01 2013-10-01 false Can recipients be penalized for failing to meet... Goals, Good Faith Efforts, and Counting § 26.47 Can recipients be penalized for failing to meet overall... rule, because your DBE participation falls short of your overall goal, unless you have failed to...
27 CFR 19.957 - Instructions to compute bond penal sum.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Instructions to compute bond penal sum. 19.957 Section 19.957 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX... Fuel Use Bonds § 19.957 Instructions to compute bond penal sum. (a) Medium plants. To find the...
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Quantile regression for climate data
NASA Astrophysics Data System (ADS)
Marasinghe, Dilhani Shalika
Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
McDonough, Molly; Rabe, Brian; Saha, Margaret
2015-01-01
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. PMID:25738861