Liu, Jin; Wang, Kai; Ma, Shuangge; Huang, Jian
2013-01-01
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with LASSO and MCP approaches are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using heterogeneous stock mice data and a rheumatoid arthritis. PMID:25258655
Gentry, Amanda Elswick; Jackson-Cook, Colleen K; Lyon, Debra E; Archer, Kellie J
2015-01-01
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment. PMID:26052223
Pineda, Silvia; Real, Francisco X.; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J.
2015-01-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Penalized solutions to functional regression problems
Harezlak, Jaroslaw; Coull, Brent A.; Laird, Nan M.; Magari, Shannon R.; Christiani, David C.
2007-01-01
SUMMARY Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the bases function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis functions coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on the heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of particulate matter corresponding to smoking breaks. PMID:18552972
Penalized solutions to functional regression problems.
Harezlak, Jaroslaw; Coull, Brent A; Laird, Nan M; Magari, Shannon R; Christiani, David C
2007-06-15
Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the bases function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L(2)-norm of the basis functions coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on the heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of particulate matter corresponding to smoking breaks. PMID:18552972
Multi-locus Association Testing with Penalized Regression
Basu, Saonli; Pan, Wei; Shen, Xiaotong; Oetting, William S.
2012-01-01
In multi-locus association analysis, since some markers may not be associated with a trait, it seems attractive to use penalized regression with the capability of automatic variable selection. On the other hand, in spite of a rapidly growing body of literature on penalized regression, most focus on variable selection and outcome prediction, for which penalized methods are generally more effective than their non-penalized counterparts. However, for statistical inference, i.e. hypothesis testing and interval estimation, it is less clear how penalized methods would perform, or even how to best apply them, largely due to lack of studies on this topic. In our motivating data for a cohort of kidney transplant recipients, it is of primary interest to assess whether a group of genetic variants are associated with a binary clinical outcome, acute rejection at 6 months. In this paper, we study some technical issues and alternative implementations of hypothesis testing in Lasso penalized logistic regression, and compare their performance with each other and with several existing global tests, some of which are specifically designed as variance component tests for high-dimensional data. The most interesting, and perhaps surprising, conclusion of this study is that, for low to moderately high-dimensional data, statistical tests based on Lasso penalized regression are not necessarily more powerful than some existing global tests. In addition, in penalized regression, rather than building a test based on a single selected “best” model, combining multiple tests, each of which is built on a candidate model, might be more promising. PMID:21922539
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Penalized count data regression with application to hospital stay after pediatric cardiac surgery
Wang, Zhu; Ma, Shuangge; Zappitelli, Michael; Parikh, Chirag; Wang, Ching-Yun; Devarajan, Prasad
2014-01-01
Pediatric cardiac surgery may lead to poor outcomes such as acute kidney injury (AKI) and prolonged hospital length of stay (LOS). Plasma and urine biomarkers may help with early identification and prediction of these adverse clinical outcomes. In a recent multi-center study, 311 children undergoing cardiac surgery were enrolled to evaluate multiple biomarkers for diagnosis and prognosis of AKI and other clinical outcomes. LOS is often analyzed as count data, thus Poisson regression and negative binomial (NB) regression are common choices for developing predictive models. With many correlated prognostic factors and biomarkers, variable selection is an important step. The present paper proposes new variable selection methods for Poisson and NB regression. We evaluated regularized regression through penalized likelihood function. We first extend the elastic net (Enet) Poisson to two penalized Poisson regression: Mnet, a combination of minimax concave and ridge penalties; and Snet, a combination of smoothly clipped absolute deviation (SCAD) and ridge penalties. Furthermore, we extend the above methods to the penalized NB regression. For the Enet, Mnet, and Snet penalties (EMSnet), we develop a unified algorithm to estimate the parameters and conduct variable selection simultaneously. Simulation studies show that the proposed methods have advantages with highly correlated predictors, against some of the competing methods. Applying the proposed methods to the aforementioned data, it is discovered that early postoperative urine biomarkers including NGAL, IL18, and KIM-1 independently predict LOS, after adjusting for risk and biomarker variables. PMID:24742430
Penalized regression procedures for variable selection in the potential outcomes framework
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.
2015-01-01
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N.; Guan, Weihua; Kang, Jian; Li, Yun
2016-01-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717
PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data
Hoffman, Gabriel E.; Logsdon, Benjamin A.; Mezey, Jason G.
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one
PUMA: a unified framework for penalized multiple regression analysis of GWAS data.
Hoffman, Gabriel E; Logsdon, Benjamin A; Mezey, Jason G
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one
2014-01-01
Motivation It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. Results We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. Conclusion We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data. PMID:25559769
On penalized likelihood estimation for a non-proportional hazards regression model.
Devarajan, Karthik; Ebrahimi, Nader
2013-07-01
In this paper, a semi-parametric generalization of the Cox model that permits crossing hazard curves is described. A theoretical framework for estimation in this model is developed based on penalized likelihood methods. It is shown that the optimal solution to the baseline hazard, baseline cumulative hazard and their ratio are hyperbolic splines with knots at the distinct failure times. PMID:24791034
Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon
2015-01-01
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration. PMID:26346893
Genomewide Multiple-Loci Mapping in Experimental Crosses by Iterative Adaptive Penalized Regression
Sun, Wei; Ibrahim, Joseph G.; Zou, Fei
2010-01-01
Genomewide multiple-loci mapping can be viewed as a challenging variable selection problem where the major objective is to select genetic markers related to a trait of interest. It is challenging because the number of genetic markers is large (often much larger than the sample size) and there is often strong linkage or linkage disequilibrium between markers. In this article, we developed two methods for genomewide multiple loci mapping: the Bayesian adaptive Lasso and the iterative adaptive Lasso. Compared with eight existing methods, the proposed methods have improved variable selection performance in both simulation and real data studies. The advantages of our methods come from the assignment of adaptive weights to different genetic makers and the iterative updating of these adaptive weights. The iterative adaptive Lasso is also computationally much more efficient than the commonly used marginal regression and stepwise regression methods. Although our methods are motivated by multiple-loci mapping, they are general enough to be applied to other variable selection problems. PMID:20157003
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2009-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape and topology optimization of fluid flows. Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid in complex geometries (where grid resolution varies both in space and time time), while Brinkman volume penalization allows easy variation of flow geometry without using body-fitted meshes by simply changing the shape of the penalization region. The use of Brinkman volume penalization approach allow seamless transition from shape to topology optimization by combining it with level set approach and increasing the size of the optimization space. The approach is demonstrated for shape optimization of a variety of fluid flows by optimizing single cost function (time averaged Drag coefficient) using covariance matrix adaptation (CMA) evolutionary algorithm.
Regression methods for spatial data
NASA Technical Reports Server (NTRS)
Yakowitz, S. J.; Szidarovszky, F.
1982-01-01
The kriging approach, a parametric regression method used by hydrologists and mining engineers, among others also provides an error estimate the integral of the regression function. The kriging method is explored and some of its statistical characteristics are described. The Watson method and theory are extended so that the kriging features are displayed. Theoretical and computational comparisons of the kriging and Watson approaches are offered.
Shang, Shang; Bai, Jing; Song, Xiaolei; Wang, Hongkai; Lau, Jaclyn
2007-01-01
Conjugate gradient method is verified to be efficient for nonlinear optimization problems of large-dimension data. In this paper, a penalized linear and nonlinear combined conjugate gradient method for the reconstruction of fluorescence molecular tomography (FMT) is presented. The algorithm combines the linear conjugate gradient method and the nonlinear conjugate gradient method together based on a restart strategy, in order to take advantage of the two kinds of conjugate gradient methods and compensate for the disadvantages. A quadratic penalty method is adopted to gain a nonnegative constraint and reduce the illposedness of the problem. Simulation studies show that the presented algorithm is accurate, stable, and fast. It has a better performance than the conventional conjugate gradient-based reconstruction algorithms. It offers an effective approach to reconstruct fluorochrome information for FMT. PMID:18354740
Differentiating among penal states.
Lacey, Nicola
2010-12-01
This review article assesses Loïc Wacquant's contribution to debates on penality, focusing on his most recent book, Punishing the Poor: The Neoliberal Government of Social Insecurity (Wacquant 2009), while setting its argument in the context of his earlier Prisons of Poverty (1999). In particular, it draws on both historical and comparative methods to question whether Wacquant's conception of 'the penal state' is adequately differentiated for the purposes of building the explanatory account he proposes; about whether 'neo-liberalism' has, materially, the global influence which he ascribes to it; and about whether, therefore, the process of penal Americanization which he asserts in his recent writings is credible. PMID:21138432
NASA Astrophysics Data System (ADS)
Kasimov, Nurlybek; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-11-01
A novel volume penalization method to enforce immersed boundary conditions in Navier-Stokes and Euler equations is presented. Previously, Brinkman penalization has been used to introduce solid obstacles modeled as porous media, although it is limited to Dirichlet-type conditions on velocity and temperature. This method builds upon Brinkman penalization by allowing Neumann conditions to be applied in a general fashion. Correct boundary conditions are achieved through characteristic propagation into the thin layer inside of the obstacle. Inward pointing characteristics ensure nonphysical solution inside the obstacle does not propagate outside to the fluid. Dirichlet boundary conditions are enforced similarly to Brinkman method. Penalization parameters act on a much faster timescale than the characteristic timescale of the flow. Main advantage of the method is systematic means of the error control. This talk is focused on the progress that was made towards the extension of the method to the 3D flows around irregular shapes. This work was supported by ONR MURI on Soil Blast Modeling.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-06-01
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches. PMID:22376200
Numerical simulation of fluid-structure interaction with the volume penalization method
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Jörn
2015-01-01
We present a novel scheme for the numerical simulation of fluid-structure interaction problems. It extends the volume penalization method, a member of the family of immersed boundary methods, to take into account flexible obstacles. We show how the introduction of a smoothing layer, physically interpreted as surface roughness, allows for arbitrary motion of the deformable obstacle. The approach is carefully validated and good agreement with various results in the literature is found. A simple one-dimensional solid model is derived, capable of modeling arbitrarily large deformations and imposed motion at the leading edge, as it is required for the simulation of simplified models for insect flight. The model error is shown to be small, while the one-dimensional character of the model features a reasonably easy implementation. The coupled fluid-solid interaction solver is shown not to introduce artificial energy in the numerical coupling, and validated using a widely used benchmark. We conclude with the application of our method to models for insect flight and study the propulsive efficiency of one and two wing sections.
NASA Astrophysics Data System (ADS)
Tauriello, Gerardo; Koumoutsakos, Petros
2015-02-01
We present a comparative study of penalization and phase field methods for the solution of the diffusion equation in complex geometries embedded using simple Cartesian meshes. The two methods have been widely employed to solve partial differential equations in complex and moving geometries for applications ranging from solid and fluid mechanics to biology and geophysics. Their popularity is largely due to their discretization on Cartesian meshes thus avoiding the need to create body-fitted grids. At the same time, there are questions regarding their accuracy and it appears that the use of each one is confined by disciplinary boundaries. Here, we compare penalization and phase field methods to handle problems with Neumann and Robin boundary conditions. We discuss extensions for Dirichlet boundary conditions and in turn compare with methods that have been explicitly designed to handle Dirichlet boundary conditions. The accuracy of all methods is analyzed using one and two dimensional benchmark problems such as the flow induced by an oscillating wall and by a cylinder performing rotary oscillations. This comparative study provides information to decide which methods to consider for a given application and their incorporation in broader computational frameworks. We demonstrate that phase field methods are more accurate than penalization methods on problems with Neumann boundary conditions and we present an error analysis explaining this result.
REGRESSION METHODS FOR DATA WITH INCOMPLETE COVARIATES
Modern statistical methods in chronic disease epidemiology allow simultaneous regression of disease status on several covariates. hese methods permit examination of the effects of one covariate while controlling for those of others that may be causally related to the disease. owe...
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2010-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape optimization for drag reduction in flows past linked bodies. This optimization relies on Adaptive Wavelet Collocation Method along with the Brinkman penalization technique and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid, while a level set approach is used to describe the body shape and the Brinkman volume penalization allows for an easy variation of flow geometry without requiring body-fitted meshes. We perform 2D simulations of linked bodies in order to investigate whether flat geometries are optimal for drag reduction. In order to accelerate the costly cost function evaluations we exploit the inherent parallelism of ES and we extend the CMA-ES implementation to a multi-host framework. This framework allows for an easy distribution of the cost function evaluations across several parallel architectures and it is not limited to only one computing facility. The resulting optimal shapes are geometrically consistent with the shapes that have been obtained in the pioneering wind tunnel experiments for drag reduction using Evolution Strategies by Ingo Rechenberg.
NASA Astrophysics Data System (ADS)
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J. T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylo-Couette flow, the z-pinch configuration, three dimensional Orszag-Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylo-Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J.T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylor–Couette flow, the z-pinch configuration, three dimensional Orszag–Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylor–Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
TEMPERATURE SCENARIO DEVELOPMENT USING REGRESSION METHODS
A method of developing scenarios of future temperature conditions resulting from climatic change is presented. he method is straightforward and can be used to provide information about daily temperature variations and diurnal ranges, monthly average high, and low temperatures, an...
The Precision Efficacy Analysis for Regression Sample Size Method.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…
Shrinkage regression-based methods for microarray missing value imputation
2013-01-01
Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159
Calculation of Solar Radiation by Using Regression Methods
NASA Astrophysics Data System (ADS)
Kızıltan, Ö.; Şahin, M.
2016-04-01
In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.
Numerical study of impeller-driven von Kármán flows via a volume penalization method
NASA Astrophysics Data System (ADS)
Kreuzahler, S.; Schulz, D.; Homann, H.; Ponty, Y.; Grauer, R.
2014-10-01
Studying strongly turbulent flows is still a major challenge in fluid dynamics. It is highly desirable to have comparable experiments to obtain a better understanding of the mechanisms generating turbulence. The von Kármán flow apparatus is one of those experiments that has been used in various turbulence studies by different experimental groups over the last two decades. The von Kármán flow apparatus produces a highly turbulent flow inside a cylinder vessel driven by two counter-rotating impellers. The studies cover a broad range of physical systems including incompressible flows, especially water and air, magnetohydrodynamic systems using liquid metal for understanding the important topic of the dynamo instability, particle tracking to study Lagrangian type turbulence and recently quantum turbulence in super-fluid helium. Therefore, accompanying numerical studies of the von Kármán flow that compare quantitatively data with those from experiments are of high importance for understanding the mechanism producing the characteristic flow patterns. We present a direct numerical simulation (DNS) version the von Kármán flow, forced by two rotating impellers. The cylinder geometry and the rotating objects are modelled via a penalization method and implemented in a massive parallel pseudo-spectral Navier-Stokes solver. From the wide range of different impellers used in von Kármán water and sodium experiments we choose a special configuration (TM28), in order to compare our simulations with the according set of well documented water experiments. Though this configuration is different from the one in the final VKS experiment (TM73), using our method it is quite easy to change the impeller shape to the one actually used in VKS. The decomposition into poloidal and toroidal components and the mean velocity field from our simulations are in good agreement with experimental results. In addition, we analysed the flow structure close to the impeller blades, a region
NASA Astrophysics Data System (ADS)
Chatelin, Robin; Poncet, Philippe
2014-07-01
Particle methods are very convenient to compute transport equations in fluid mechanics as their computational cost is linear and they are not limited by convection stability conditions. To achieve large 3D computations the method must be coupled to efficient algorithms for velocity computations, including a good treatment of non-homogeneities and complex moving geometries. The Penalization method enables to consider moving bodies interaction by adding a term in the conservation of momentum equation. This work introduces a new computational algorithm to solve implicitly in the same step the Penalization term and the Laplace operators, since explicit computations are limited by stability issues, especially at low Reynolds number. This computational algorithm is based on the Sherman-Morrison-Woodbury formula coupled to a GMRES iterative method to reduce the computations to a sequence of Poisson problems: this allows to formulate a penalized Poisson equation as a large perturbation of a standard Poisson, by means of algebraic relations. A direct consequence is the possibility to use fast solvers based on Fast Fourier Transforms for this problem with good efficiency from both the computational and the memory consumption point of views, since these solvers are recursive and they do not perform any matrix assembling. The resulting fluid mechanics computations are very fast and they consume a small amount of memory, compared to a reference solver or a linear system resolution. The present applications focus mainly on a coupling between transport equation and 3D Stokes equations, for studying biological organisms motion in a highly viscous flows with variable viscosity.
Cheng, Lishui; Hobbs, Robert F.; Sgouros, George; Frey, Eric C.
2014-01-01
Purpose: Three-dimensional (3D) dosimetry has the potential to provide better prediction of response of normal tissues and tumors and is based on 3D estimates of the activity distribution in the patient obtained from emission tomography. Dose–volume histograms (DVHs) are an important summary measure of 3D dosimetry and a widely used tool for treatment planning in radiation therapy. Accurate estimates of the radioactivity distribution in space and time are desirable for accurate 3D dosimetry. The purpose of this work was to develop and demonstrate the potential of penalized SPECT image reconstruction methods to improve DVHs estimates obtained from 3D dosimetry methods. Methods: The authors developed penalized image reconstruction methods, using maximum a posteriori (MAP) formalism, which intrinsically incorporate regularization in order to control noise and, unlike linear filters, are designed to retain sharp edges. Two priors were studied: one is a 3D hyperbolic prior, termed single-time MAP (STMAP), and the second is a 4D hyperbolic prior, termed cross-time MAP (CTMAP), using both the spatial and temporal information to control noise. The CTMAP method assumed perfect registration between the estimated activity distributions and projection datasets from the different time points. Accelerated and convergent algorithms were derived and implemented. A modified NURBS-based cardiac-torso phantom with a multicompartment kidney model and organ activities and parameters derived from clinical studies were used in a Monte Carlo simulation study to evaluate the methods. Cumulative dose-rate volume histograms (CDRVHs) and cumulative DVHs (CDVHs) obtained from the phantom and from SPECT images reconstructed with both the penalized algorithms and OS-EM were calculated and compared both qualitatively and quantitatively. The STMAP method was applied to patient data and CDRVHs obtained with STMAP and OS-EM were compared qualitatively. Results: The results showed that the
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Conventional occlusion versus pharmacologic penalization for amblyopia
Li, Tianjing; Shotton, Kate
2013-01-01
Background Amblyopia is defined as defective visual acuity in one or both eyes without demonstrable abnormality of the visual pathway, and is not immediately resolved by wearing glasses. Objectives To assess the effectiveness and safety of conventional occlusion versus atropine penalization for amblyopia. Search methods We searched CENTRAL, MEDLINE, EMBASE, LILACS, the WHO International Clinical Trials Registry Platform, preference lists, science citation index and ongoing trials up to June 2009. Selection criteria We included randomized/quasi-randomized controlled trials comparing conventional occlusion to atropine penalization for amblyopia. Data collection and analysis Two authors independently screened abstracts and full text articles, abstracted data, and assessed the risk of bias. Main results Three trials with a total of 525 amblyopic eyes were included. One trial was assessed as having a low risk of bias among these three trials, and one was assessed as having a high risk of bias. Evidence from three trials suggests atropine penalization is as effective as conventional occlusion. One trial found similar improvement in vision at six and 24 months. At six months, visual acuity in the amblyopic eye improved from baseline 3.16 lines in the occlusion and 2.84 lines in the atropine group (mean difference 0.034 logMAR; 95% confidence interval (CI) 0.005 to 0.064 logMAR). At 24 months, additional improvement was seen in both groups; but there continued to be no meaningful difference (mean difference 0.01 logMAR; 95% CI −0.02 to 0.04 logMAR). The second trial reported atropine to be more effective than occlusion. At six months, visual acuity improved 1.8 lines in the patching group and 3.4 lines in the atropine penalization group, and was in favor of atropine (mean difference −0.16 logMAR; 95% CI −0.23 to −0.09 logMAR). Different occlusion modalities were used in these two trials. The third trial had inherent methodological flaws and limited inference could
Sparse Multivariate Regression With Covariance Estimation
Rothman, Adam J.; Levina, Elizaveta; Zhu, Ji
2014-01-01
We propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. This method, which we call multivariate regression with covariance estimation (MRCE), involves penalized likelihood with simultaneous estimation of the regression coefficients and the covariance structure. An efficient optimization algorithm and a fast approximation are developed for computing MRCE. Using simulation studies, we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns. An R-package containing this dataset and code for computing MRCE and its approximation are available online. PMID:24963268
Multiobjective Optimization for Model Selection in Kernel Methods in Regression
You, Di; Benitez-Quiroz, C. Fabian; Martinez, Aleix M.
2016-01-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-vs-variance trade-off. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a trade-off between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared to methods in the state of the art. PMID:25291740
Multiobjective optimization for model selection in kernel methods in regression.
You, Di; Benitez-Quiroz, Carlos Fabian; Martinez, Aleix M
2014-10-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-versus-variance tradeoff. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a tradeoff between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition, and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared with methods in the state of the art. PMID:25291740
The extinction law from photometric data: linear regression methods
NASA Astrophysics Data System (ADS)
Ascenso, J.; Lombardi, M.; Lada, C. J.; Alves, J.
2012-04-01
Context. The properties of dust grains, in particular their size distribution, are expected to differ from the interstellar medium to the high-density regions within molecular clouds. Since the extinction at near-infrared wavelengths is caused by dust, the extinction law in cores should depart from that found in low-density environments if the dust grains have different properties. Aims: We explore methods to measure the near-infrared extinction law produced by dense material in molecular cloud cores from photometric data. Methods: Using controlled sets of synthetic and semi-synthetic data, we test several methods for linear regression applied to the specific problem of deriving the extinction law from photometric data. We cover the parameter space appropriate to this type of observations. Results: We find that many of the common linear-regression methods produce biased results when applied to the extinction law from photometric colors. We propose and validate a new method, LinES, as the most reliable for this effect. We explore the use of this method to detect whether or not the extinction law of a given reddened population has a break at some value of extinction. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (ESO programmes 069.C-0426 and 074.C-0728).
Cathodic protection design using the regression and correlation method
Niembro, A.M.; Ortiz, E.L.G.
1997-09-01
A computerized statistical method which calculates the current demand requirement based on potential measurements for cathodic protection systems is introduced. The method uses the regression and correlation analysis of statistical measurements of current and potentials of the piping network. This approach involves four steps: field potential measurements, statistical determination of the current required to achieve full protection, installation of more cathodic protection capacity with distributed anodes around the plant and examination of the protection potentials. The procedure is described and recommendations for the improvement of the existing and new cathodic protection systems are given.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
A locally adaptive kernel regression method for facies delineation
NASA Astrophysics Data System (ADS)
Fernàndez-Garcia, D.; Barahona-Palomo, M.; Henri, C. V.; Sanchez-Vila, X.
2015-12-01
Facies delineation is defined as the separation of geological units with distinct intrinsic characteristics (grain size, hydraulic conductivity, mineralogical composition). A major challenge in this area stems from the fact that only a few scattered pieces of hydrogeological information are available to delineate geological facies. Several methods to delineate facies are available in the literature, ranging from those based only on existing hard data, to those including secondary data or external knowledge about sedimentological patterns. This paper describes a methodology to use kernel regression methods as an effective tool for facies delineation. The method uses both the spatial and the actual sampled values to produce, for each individual hard data point, a locally adaptive steering kernel function, self-adjusting the principal directions of the local anisotropic kernels to the direction of highest local spatial correlation. The method is shown to outperform the nearest neighbor classification method in a number of synthetic aquifers whenever the available number of hard data is small and randomly distributed in space. In the case of exhaustive sampling, the steering kernel regression method converges to the true solution. Simulations ran in a suite of synthetic examples are used to explore the selection of kernel parameters in typical field settings. It is shown that, in practice, a rule of thumb can be used to obtain suboptimal results. The performance of the method is demonstrated to significantly improve when external information regarding facies proportions is incorporated. Remarkably, the method allows for a reasonable reconstruction of the facies connectivity patterns, shown in terms of breakthrough curves performance.
Robust Logistic and Probit Methods for Binary and Multinomial Regression
Tabatabai, MA; Li, H; Eby, WM; Kengwoung-Keumo, JJ; Manne, U; Bae, S; Fouad, M; Singh, KP
2015-01-01
In this paper we introduce new robust estimators for the logistic and probit regressions for binary, multinomial, nominal and ordinal data and apply these models to estimate the parameters when outliers or inluential observations are present. Maximum likelihood estimates don't behave well when outliers or inluential observations are present. One remedy is to remove inluential observations from the data and then apply the maximum likelihood technique on the deleted data. Another approach is to employ a robust technique that can handle outliers and inluential observations without removing any observations from the data sets. The robustness of the method is tested using real and simulated data sets. PMID:26078914
Analyzing big data with the hybrid interval regression methods.
Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968
Analyzing Big Data with the Hybrid Interval Regression Methods
Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968
L1-Penalized N-way PLS for subset of electrodes selection in BCI experiments
NASA Astrophysics Data System (ADS)
Eliseyev, Andrey; Moro, Cecile; Faber, Jean; Wyss, Alexander; Torres, Napoleon; Mestais, Corinne; Benabid, Alim Louis; Aksenova, Tetiana
2012-08-01
Recently, the N-way partial least squares (NPLS) approach was reported as an effective tool for neuronal signal decoding and brain-computer interface (BCI) system calibration. This method simultaneously analyzes data in several domains. It combines the projection of a data tensor to a low dimensional space with linear regression. In this paper the L1-Penalized NPLS is proposed for sparse BCI system calibration, allowing uniting the projection technique with an effective selection of subset of features. The L1-Penalized NPLS was applied for the binary self-paced BCI system calibration, providing selection of electrodes subset. Our BCI system is designed for animal research, in particular for research in non-human primates.
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Kwak, Il-Youp; Moore, Candace R; Spalding, Edgar P; Broman, Karl W
2014-08-01
Most statistical methods for quantitative trait loci (QTL) mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While methods exist for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl. PMID:24931408
Bayes and empirical Bayes methods for reduced rank regression models in matched case-control studies
Zhou, Qin; Lan, Qing; Rothman, Nathaniel; Langseth, Hilde; Engel, Lawrence S.
2015-01-01
Summary Matched case-control studies are popular designs used in epidemiology for assessing the effects of exposures on binary traits. Modern studies increasingly enjoy the ability to examine a large number of exposures in a comprehensive manner. However, several risk factors often tend to be related in a non-trivial way, undermining efforts to identify the risk factors using standard analytic methods due to inflated type I errors and possible masking of effects. Epidemiologists often use data reduction techniques by grouping the prognostic factors using a thematic approach, with themes deriving from biological considerations. We propose shrinkage type estimators based on Bayesian penalization methods to estimate the effects of the risk factors using these themes. The properties of the estimators are examined using extensive simulations. The methodology is illustrated using data from a matched case-control study of polychlorinflated biphenyls in relation to the etiology of non-Hodgkin’s lymphoma. PMID:26575519
Satagopan, Jaya M; Sen, Ananda; Zhou, Qin; Lan, Qing; Rothman, Nathaniel; Langseth, Hilde; Engel, Lawrence S
2016-06-01
Matched case-control studies are popular designs used in epidemiology for assessing the effects of exposures on binary traits. Modern studies increasingly enjoy the ability to examine a large number of exposures in a comprehensive manner. However, several risk factors often tend to be related in a nontrivial way, undermining efforts to identify the risk factors using standard analytic methods due to inflated type-I errors and possible masking of effects. Epidemiologists often use data reduction techniques by grouping the prognostic factors using a thematic approach, with themes deriving from biological considerations. We propose shrinkage-type estimators based on Bayesian penalization methods to estimate the effects of the risk factors using these themes. The properties of the estimators are examined using extensive simulations. The methodology is illustrated using data from a matched case-control study of polychlorinated biphenyls in relation to the etiology of non-Hodgkin's lymphoma. PMID:26575519
An Investigation of the Median-Median Method of Linear Regression
ERIC Educational Resources Information Center
Walters, Elizabeth J.; Morrell, Christopher H.; Auer, Richard E.
2006-01-01
Least squares regression is the most common method of fitting a straight line to a set of bivariate data. Another less known method that is available on Texas Instruments graphing calculators is median-median regression. This method is proposed as a simple method that may be used with middle and high school students to motivate the idea of fitting…
Weighted Structural Regression: A Broad Class of Adaptive Methods for Improving Linear Prediction.
ERIC Educational Resources Information Center
Pruzek, Robert M.; Lepak, Greg M.
1992-01-01
Adaptive forms of weighted structural regression are developed and discussed. Bootstrapping studies indicate that the new methods have potential to recover known population regression weights and predict criterion score values routinely better than do ordinary least squares methods. The new methods are scale free and simple to compute. (SLD)
NASA Astrophysics Data System (ADS)
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Evaluation of preservative systems in a sunscreen formula by linear regression method.
Bou-Chacra, Nádia A; Pinto, Terezinha de Jesus A; Ohara, Mitsuko Taba
2003-01-01
A sunscreen formula with eight different preservative systems was evaluated by linear regression, pharmacopeial, and the CTFA (Cosmetic, Toiletry and Fragrance Association) methods. The preparations were tested against Staphylococcus aureus, Burkholderia cepacia, Shewanella putrefaciens, Escherichia coli, and Bacillus sp. The linear regression method proved to be useful in the selection of the most effective preservative system used in cosmetic formulation. PMID:12688287
A new method for dealing with measurement error in explanatory variables of regression models.
Freedman, Laurence S; Fainberg, Vitaly; Kipnis, Victor; Midthune, Douglas; Carroll, Raymond J
2004-03-01
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables. PMID:15032787
Risk prediction with machine learning and regression methods.
Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben
2014-07-01
This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. PMID:24615859
Gaussian Process Regression Plus Method for Localization Reliability Improvement.
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-01-01
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions. PMID:27483276
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. PMID:23379851
[Criminalistic and penal problems with "dyadic deaths"].
Kaliszczak, Paweł; Kunz, Jerzy; Bolechała, Filip
2002-01-01
This paper is a supplement to the article "Medico legal problems of dyadic death" elaborated by the same authors. Recalling the cases presented there. It is also an attempt to present the basic criminalistic, penal and definitional problems of dyadic death called also postagressional suicide. Criminalistic problems of dyadic death were presented in view of widely known "rule of seven golden questions"--what?, where?, when?, how?, why?, what method? and who? Criminalistic analysis of cases makes some differences in conclusions but it seemed interesting to match both--criminalistc and forensic points of views to the presented material. PMID:14669688
Interquantile Shrinkage and Variable Selection in Quantile Regression
Jiang, Liewen; Bondell, Howard D.; Wang, Huixia Judy
2014-01-01
Examination of multiple conditional quantile functions provides a comprehensive view of the relationship between the response and covariates. In situations where quantile slope coefficients share some common features, estimation efficiency and model interpretability can be improved by utilizing such commonality across quantiles. Furthermore, elimination of irrelevant predictors will also aid in estimation and interpretation. These motivations lead to the development of two penalization methods, which can identify the interquantile commonality and nonzero quantile coefficients simultaneously. The developed methods are based on a fused penalty that encourages sparsity of both quantile coefficients and interquantile slope differences. The oracle properties of the proposed penalization methods are established. Through numerical investigations, it is demonstrated that the proposed methods lead to simpler model structure and higher estimation efficiency than the traditional quantile regression estimation. PMID:24653545
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Regression calibration method for correcting measurement-error bias in nutritional epidemiology.
Spiegelman, D; McDermott, A; Rosner, B
1997-04-01
Regression calibration is a statistical method for adjusting point and interval estimates of effect obtained from regression models commonly used in epidemiology for bias due to measurement error in assessing nutrients or other variables. Previous work developed regression calibration for use in estimating odds ratios from logistic regression. We extend this here to estimating incidence rate ratios from Cox proportional hazards models and regression slopes from linear-regression models. Regression calibration is appropriate when a gold standard is available in a validation study and a linear measurement error with constant variance applies or when replicate measurements are available in a reliability study and linear random within-person error can be assumed. In this paper, the method is illustrated by correction of rate ratios describing the relations between the incidence of breast cancer and dietary intakes of vitamin A, alcohol, and total energy in the Nurses' Health Study. An example using linear regression is based on estimation of the relation between ultradistal radius bone density and dietary intakes of caffeine, calcium, and total energy in the Massachusetts Women's Health Study. Software implementing these methods uses SAS macros. PMID:9094918
Cox Regression Models with Functional Covariates for Survival Data
Gellar, Jonathan E.; Colantuoni, Elizabeth; Needham, Dale M.; Crainiceanu, Ciprian M.
2015-01-01
We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome. PMID:26441487
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
ERIC Educational Resources Information Center
Blair, R. Clifford; Higgins, J.J.
1978-01-01
The controversy surrounding regression methods for unbalanced factorial designs is addressed. The statistical hypotheses being tested under the various methods, as well as salient issues in the use of these methods, are discussed. The use of statistical computer packages is also discussed. (Author/JKS)
Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.
1992-01-01
Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
ERIC Educational Resources Information Center
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
A comparison of several methods of solving nonlinear regression groundwater flow problems.
Cooley, R.L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems.-from Author
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Ksantini, Riadh; Ziou, Djemel; Colin, Bernard; Dubeau, François
2008-02-01
In this paper, we investigate the effectiveness of a Bayesian logistic regression model to compute the weights of a pseudo-metric, in order to improve its discriminatory capacity and thereby increase image retrieval accuracy. In the proposed Bayesian model, the prior knowledge of the observations is incorporated and the posterior distribution is approximated by a tractable Gaussian form using variational transformation and Jensen's inequality, which allow a fast and straightforward computation of the weights. The pseudo-metric makes use of the compressed and quantized versions of wavelet decomposed feature vectors, and in our previous work, the weights were adjusted by classical logistic regression model. A comparative evaluation of the Bayesian and classical logistic regression models is performed for content-based image retrieval as well as for other classification tasks, in a decontextualized evaluation framework. In this same framework, we compare the Bayesian logistic regression model to some relevant state-of-the-art classification algorithms. Experimental results show that the Bayesian logistic regression model outperforms these linear classification algorithms, and is a significantly better tool than the classical logistic regression model to compute the pseudo-metric weights and improve retrieval and classification performance. Finally, we perform a comparison with results obtained by other retrieval methods. PMID:18084057
GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
Zheng, Qi; Peng, Limin; He, Xuming
2015-01-01
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model. PMID:15490436
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Factor Regression Analysis: A New Method for Weighting Predictors. Final Report.
ERIC Educational Resources Information Center
Curtis, Ervin W.
The optimum weighting of variables to predict a dependent-criterion variable is an important problem in nearly all of the social and natural sciences. Although the predominant method, multiple regression analysis (MR), yields optimum weights for the sample at hand, these weights are not generally optimum in the population from which the sample was…
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Estimation of anthropogenic heat emission over South Korea using a statistical regression method
NASA Astrophysics Data System (ADS)
Lee, Sang-Hyun; Kim, Soon-Tae
2015-05-01
Anthropogenic heating by human activity is one of the key contributing factors in forming urban heat islands, thus inclusion of the heat source plays an important role in urban meteorological and environmental modeling. In this study, gridded anthropogenic heat flux (AHF) with high spatial (1-km) and temporal (1-hr) resolution is estimated for the whole South Korea region in year 2010 using a statistical regression method which derives based on similarity of anthropogenic air pollutant emissions and AHF in their emission inventories. The bottom-up anthropogenic pollutant emissions required for the regression method are produced using the intensive Korean air pollutants emission inventories. The calculated regression-based AHF compares well with the inventory-based AHF estimation for the Gyeong-In region, demonstrating that the statistical regression method can reasonably represent spatio-temporal variation of the AHF within the region. The estimated AHF shows that for major Korean cities (Seoul, Busan, Daegu, Gwangju, Daejeon, and Ulsan) the annual mean AHF range 10-50 Wm-2 on a grid scale and 20-30W m-2 on a city-scale. The winter AHF are larger by about 22% than that in summer, while the weekday AHF are larger by 4-5% than the weekend AHF in the major Korean cities. The gridded AHF data estimated in this study can be used in mesoscale meteorological and environmental modeling for the South Korea region.
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
Penalized Q-Learning for Dynamic Treatment Regimens
Song, R.; Wang, W.; Zeng, D.; Kosorok, M. R.
2014-01-01
A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. It is illustrated with a depression clinical trial study. PMID:26257504
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234
NASA Astrophysics Data System (ADS)
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.
Race Making in a Penal Institution.
Walker, Michael L
2016-01-01
This article provides a ground-level investigation into the lives of penal inmates, linking the literature on race making and penal management to provide an understanding of racial formation processes in a modern penal institution. Drawing on 135 days of ethnographic data collected as an inmate in a Southern California county jail system, the author argues that inmates are subjected to two mutually constitutive racial projects--one institutional and the other microinteractional. Operating in symbiosis within a narrative of risk management, these racial projects increase (rather than decrease) incidents of intraracial violence and the potential for interracial violence. These findings have implications for understanding the process of racialization and evaluating the effectiveness of penal management strategies. PMID:27017706
Zhang, Li-qing; Wu, Xiao-hua; Tang, Xi; Zhu, Xian-liang; Su, Wen-ting
2002-06-01
Principal component regression (PCR) method is used to analyse five components: acetaminophen, p-aminophenol, caffeine, chlorphenamine maleate and guaifenesin. The basic principle and the analytical step of the approach are described in detail. The computer program of LHG is based on VB language. The experimental result shows that the PCR method has no systematical error as compared to classical method. The experimental result shows that the average recovery of each component is all in the range from 96.43% to 107.14%. Each component obtains satisfactory result without any pre-separation. The approach is simple, rapid and suitable for the computer-aid analysis. PMID:12938324
Makra, László; Matyasovszky, István; Thibaudon, Michel; Bonini, Maira
2011-05-01
Nonparametric time-varying regression methods were developed to forecast daily ragweed pollen concentration, and the probability of the exceedance of a given concentration threshold 1 day ahead. Five-day and 10-day predictions of the start and end of the pollen season were also addressed with a nonparametric regression technique combining regression analysis with the method of temperature sum. Our methods were applied to three of the most polluted regions in Europe, namely Lyon (Rhône Valley, France), Legnano (Po River Plain, Italy) and Szeged (Great Plain, Hungary). For a 1-day prediction of both the daily pollen concentration and daily threshold exceedance, the order of these cities from the smallest to largest prediction errors was Legnano, Lyon, Szeged and Legnano, Szeged, Lyon, respectively. The most important predictor for each location was the pollen concentration of previous days. The second main predictor was precipitation for Lyon, and temperature for Legnano and Szeged. Wind speed should be considered for daily concentration at Legnano, and for daily pollen threshold exceedances at Lyon and Szeged. Prediction capabilities compared to the annual cycles for the start and end of the pollen season decreased from west to east. The order of the cities from the lowest to largest errors for the end of the pollen season was Lyon, Legnano, Szeged for both the 5- and 10-day predictions, while for the start of the pollen season the order was Legnano, Lyon, Szeged for 5-day predictions, and Legnano, Szeged, Lyon for 10-day predictions. PMID:20625911
Adaptive wavelet simulation of global ocean dynamics using a new Brinkman volume penalization
NASA Astrophysics Data System (ADS)
Kevlahan, N. K.-R.; Dubos, T.; Aechtner, M.
2015-12-01
In order to easily enforce solid-wall boundary conditions in the presence of complex coastlines, we propose a new mass and energy conserving Brinkman penalization for the rotating shallow water equations. This penalization does not lead to higher wave speeds in the solid region. The error estimates for the penalization are derived analytically and verified numerically for linearized one-dimensional equations. The penalization is implemented in a conservative dynamically adaptive wavelet method for the rotating shallow water equations on the sphere with bathymetry and coastline data from NOAA's ETOPO1 database. This code could form the dynamical core for a future global ocean model. The potential of the dynamically adaptive ocean model is illustrated by using it to simulate the 2004 Indonesian tsunami and wind-driven gyres.
Adluru, Nagesh; Hanlon, Bret M; Lutz, Antoine; Lainhart, Janet E; Alexander, Andrew L; Davidson, Richard J
2013-04-01
Neuroimage phenotyping for psychiatric and neurological disorders is performed using voxelwise analyses also known as voxel based analyses or morphometry (VBM). A typical voxelwise analysis treats measurements at each voxel (e.g., fractional anisotropy, gray matter probability) as outcome measures to study the effects of possible explanatory variables (e.g., age, group) in a linear regression setting. Furthermore, each voxel is treated independently until the stage of correction for multiple comparisons. Recently, multi-voxel pattern analyses (MVPA), such as classification, have arisen as an alternative to VBM. The main advantage of MVPA over VBM is that the former employ multivariate methods which can account for interactions among voxels in identifying significant patterns. They also provide ways for computer-aided diagnosis and prognosis at individual subject level. However, compared to VBM, the results of MVPA are often more difficult to interpret and prone to arbitrary conclusions. In this paper, first we use penalized likelihood modeling to provide a unified framework for understanding both VBM and MVPA. We then utilize statistical learning theory to provide practical methods for interpreting the results of MVPA beyond commonly used performance metrics, such as leave-one-out-cross validation accuracy and area under the receiver operating characteristic (ROC) curve. Additionally, we demonstrate that there are challenges in MVPA when trying to obtain image phenotyping information in the form of statistical parametric maps (SPMs), which are commonly obtained from VBM, and provide a bootstrap strategy as a potential solution for generating SPMs using MVPA. This technique also allows us to maximize the use of available training data. We illustrate the empirical performance of the proposed framework using two different neuroimaging studies that pose different levels of challenge for classification using MVPA. PMID:23397550
Impact of regression methods on improved effects of soil structure on soil water retention estimates
NASA Astrophysics Data System (ADS)
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
Flexible regression models over river networks
O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark
2014-01-01
Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460
An Efficient Simulation Budget Allocation Method Incorporating Regression for Partitioned Domains*
Brantley, Mark W.; Lee, Loo Hay; Chen, Chun-Hung; Xu, Jie
2014-01-01
Simulation can be a very powerful tool to help decision making in many applications but exploring multiple courses of actions can be time consuming. Numerous ranking & selection (R&S) procedures have been developed to enhance the simulation efficiency of finding the best design. To further improve efficiency, one approach is to incorporate information from across the domain into a regression equation. However, the use of a regression metamodel also inherits some typical assumptions from most regression approaches, such as the assumption of an underlying quadratic function and the simulation noise is homogeneous across the domain of interest. To extend the limitation while retaining the efficiency benefit, we propose to partition the domain of interest such that in each partition the mean of the underlying function is approximately quadratic. Our new method provides approximately optimal rules for between and within partitions that determine the number of samples allocated to each design location. The goal is to maximize the probability of correctly selecting the best design. Numerical experiments demonstrate that our new approach can dramatically enhance efficiency over existing efficient R&S methods. PMID:24936099
Li, Min; Zhou, Tong; Song, Yanan
2016-07-01
A grain size characterization method based on energy attenuation coefficient spectrum and support vector regression (SVR) is proposed. First, the spectra of the first and second back-wall echoes are cut into several frequency bands to calculate the energy attenuation coefficient spectrum. Second, the frequency band that is sensitive to grain size variation is determined. Finally, a statistical model between the energy attenuation coefficient in the sensitive frequency band and average grain size is established through SVR. Experimental verification is conducted on austenitic stainless steel. The average relative error of the predicted grain size is 5.65%, which is better than that of conventional methods. PMID:26995732
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Mortality Prediction in ICUs Using A Novel Time-Slicing Cox Regression Method
Wang, Yuan; Chen, Wenlin; Heard, Kevin; Kollef, Marin H.; Bailey, Thomas C.; Cui, Zhicheng; He, Yujie; Lu, Chenyang; Chen, Yixin
2015-01-01
Over the last few decades, machine learning and data mining have been increasingly used for clinical prediction in ICUs. However, there is still a huge gap in making full use of the time-series data generated from ICUs. Aiming at filling this gap, we propose a novel approach entitled Time Slicing Cox regression (TS-Cox), which extends the classical Cox regression into a classification method on multi-dimensional time-series. Unlike traditional classifiers such as logistic regression and support vector machines, our model not only incorporates the discriminative features derived from the time-series, but also naturally exploits the temporal orders of these features based on a Cox-like function. Empirical evaluation on MIMIC-II database demonstrates the efficacy of the TS-Cox model. Our TS-Cox model outperforms all other baseline models by a good margin in terms of AUC_PR, sensitivity and PPV, which indicates that TS-Cox may be a promising tool for mortality prediction in ICUs. PMID:26958269
A deformation analysis method of stepwise regression for bridge deflection prediction
NASA Astrophysics Data System (ADS)
Shen, Yueqian; Zeng, Ying; Zhu, Lei; Huang, Teng
2015-12-01
Large-scale bridges are among the most important infrastructures whose safe conditions concern people's daily activities and life safety. Monitoring of large-scale bridges is crucial since deformation might have occurred. How to obtain the deformation information and then judge the safe conditions are the key and difficult problems in bridge deformation monitoring field. Deflection is the important index for evaluation of bridge safety. This paper proposes a forecasting modeling of stepwise regression analysis. Based on the deflection monitoring data of Yangtze River Bridge, the main factors influenced deflection deformation is chiefly studied. Authors use the monitoring data to forecast the deformation value of a bridge deflection at different time from the perspective of non-bridge structure, and compared to the forecasting of gray relational analysis based on linear regression. The result show that the accuracy and reliability of stepwise regression analysis is high, which provides the scientific basis to the bridge operation management. And above all, the ideas of this research provide and effective method for bridge deformation analysis.
NASA Astrophysics Data System (ADS)
Jiang, Wei-Zhong; Su, Ning; Ding, Li-Ping; Qiu, Si-Yu
This paper presents a new method to estimate the system harmonic impedance and the harmonic emission level based on the improved weighted support vector machine (WSVM) regression. According to the differences of harmonic measurement data at the point of common coupling, the WSVM can be obtained by correcting the error requirement of SVM by Euclidean distance as a weighted index and determining the weighted coefficient of penalty parameter by linear interpolation, then the system harmonic impedance and the harmonic emission level can be calculated. Based on analyzing the simulation of the circuit and the practical application of field data, it proves that the proposed method can effectively restrain the influence caused by the fluctuation of background harmonic on estimation results. Compared with other methods, the estimate result of the proposed method is more reasonable.
Statistical method for prediction of gait kinematics with Gaussian process regression.
Yun, Youngmok; Kim, Hyun-Chul; Shin, Sung Yul; Lee, Junwon; Deshpande, Ashish D; Kim, Changhwan
2014-01-01
We propose a novel methodology for predicting human gait pattern kinematics based on a statistical and stochastic approach using a method called Gaussian process regression (GPR). We selected 14 body parameters that significantly affect the gait pattern and 14 joint motions that represent gait kinematics. The body parameter and gait kinematics data were recorded from 113 subjects by anthropometric measurements and a motion capture system. We generated a regression model with GPR for gait pattern prediction and built a stochastic function mapping from body parameters to gait kinematics based on the database and GPR, and validated the model with a cross validation method. The function can not only produce trajectories for the joint motions associated with gait kinematics, but can also estimate the associated uncertainties. Our approach results in a novel, low-cost and subject-specific method for predicting gait kinematics with only the subject's body parameters as the necessary input, and also enables a comprehensive understanding of the correlation and uncertainty between body parameters and gait kinematics. PMID:24211221
Least squares regression methods for clustered ROC data with discrete covariates.
Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton
2016-07-01
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods. PMID:26848938
A least trimmed square regression method for second level FMRI effective connectivity analysis.
Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas Martin
2013-01-01
We present a least trimmed square (LTS) robust regression method to combine different runs/subjects for second/high level effective connectivity analysis. The basic idea of this method is to treat the extreme nonlinear model variability as outliers if they exceed a certain threshold. A bootstrap method for the LTS estimation is employed to detect model outliers. We compared the LTS robust method with a non-robust method using simulated and real datasets. The difference between LTS and the non-robust method for second level effective connectivity analysis is significant, suggesting the conventional non-robust method is easily affected by the model variability from the first level analysis. In addition, after these outliers are detected and excluded for the high level analysis, the model coefficients of the second level are combined within the framework of a mixed model. The variance of the mixed model is estimated using the Newton-Raphson (NR) type Levenberg-Marquardt algorithm. Three sets of real data are adopted to compare conventional methods which do not include random effects in the analysis with a mixed model for second level effective connectivity analysis. The results show that the conventional method is significantly different from the mixed model when greater model variability exists, suggesting there is a strong random effect, and the mixed model should be employed for the second level effective connectivity analysis. PMID:23093379
A fast nonlinear regression method for estimating permeability in CT perfusion imaging
Bennink, Edwin; Riordan, Alan J; Horsch, Alexander D; Dankbaar, Jan Willem; Velthuis, Birgitta K; de Jong, Hugo W
2013-01-01
Blood–brain barrier damage, which can be quantified by measuring vascular permeability, is a potential predictor for hemorrhagic transformation in acute ischemic stroke. Permeability is commonly estimated by applying Patlak analysis to computed tomography (CT) perfusion data, but this method lacks precision. Applying more elaborate kinetic models by means of nonlinear regression (NLR) may improve precision, but is more time consuming and therefore less appropriate in an acute stroke setting. We propose a simplified NLR method that may be faster and still precise enough for clinical use. The aim of this study is to evaluate the reliability of in total 12 variations of Patlak analysis and NLR methods, including the simplified NLR method. Confidence intervals for the permeability estimates were evaluated using simulated CT attenuation–time curves with realistic noise, and clinical data from 20 patients. Although fixating the blood volume improved Patlak analysis, the NLR methods yielded significantly more reliable estimates, but took up to 12 × longer to calculate. The simplified NLR method was ∼4 × faster than other NLR methods, while maintaining the same confidence intervals (CIs). In conclusion, the simplified NLR method is a new, reliable way to estimate permeability in stroke, fast enough for clinical application in an acute stroke setting. PMID:23881247
Shi, Yinghuan; Gao, Yaozong; Liao, Shu; Zhang, Daoqiang
2015-01-01
In1 recent years, there has been a great interest in prostate segmentation, which is a important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician’s simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms: tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice. PMID:26752809
Multi temporal regression method for mid infrared [3-5microm] emissivity outdoor.
Nerry, Françoise; Malaplate, Alain; Stoll, Marc
2004-12-27
An experimental study to address issues encountered in the determination of surface bi-directional reflectivity and emissivity of materials [3-5microm] region has been conducted in outdoors conditions. The measurement protocol included radiometric infrared camera acquisitions in both [3-5microm] (band-2) and [8-14microm] (band-3). The band-2 bi-directional reflectivity is obtained from a sequence of sunlit and shade measurements. Best results are found with measurements relative to a diffuse aluminum reflector. Direct inversion of band-2 radiometric signal is unstable. A multitemporal method is introduced and the slope of the linear regression is the searched emisssivity. A detailed analysis is conducted to assess the impact of different sources of systematic errors. The proposed method is found to have a good potential with an estimated measurement error in the range of 2%. PMID:19488308
A non-linear regression method for CT brain perfusion analysis
NASA Astrophysics Data System (ADS)
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Gupta, N
2008-04-22
3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO{sub 2} density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures.
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
NASA Astrophysics Data System (ADS)
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Benedetti, Andrea; Platt, Robert; Atherton, Juli
2014-01-01
Background Over time, adaptive Gaussian Hermite quadrature (QUAD) has become the preferred method for estimating generalized linear mixed models with binary outcomes. However, penalized quasi-likelihood (PQL) is still used frequently. In this work, we systematically evaluated whether matching results from PQL and QUAD indicate less bias in estimated regression coefficients and variance parameters via simulation. Methods We performed a simulation study in which we varied the size of the data set, probability of the outcome, variance of the random effect, number of clusters and number of subjects per cluster, etc. We estimated bias in the regression coefficients, odds ratios and variance parameters as estimated via PQL and QUAD. We ascertained if similarity of estimated regression coefficients, odds ratios and variance parameters predicted less bias. Results Overall, we found that the absolute percent bias of the odds ratio estimated via PQL or QUAD increased as the PQL- and QUAD-estimated odds ratios became more discrepant, though results varied markedly depending on the characteristics of the dataset Conclusions Given how markedly results varied depending on data set characteristics, specifying a rule above which indicated biased results proved impossible. This work suggests that comparing results from generalized linear mixed models estimated via PQL and QUAD is a worthwhile exercise for regression coefficients and variance components obtained via QUAD, in situations where PQL is known to give reasonable results. PMID:24416249
Domain selection for the varying coefficient model via local polynomial regression
Kong, Dehan; Bondell, Howard; Wu, Yichao
2014-01-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example. PMID:25506112
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
NASA Astrophysics Data System (ADS)
Huang, Fengzhen; Li, Jingzhen; Cao, Jun
2015-02-01
Temporally and Spatially Modulated Fourier Transform Imaging Spectrometer (TSMFTIS) is a new imaging spectrometer without moving mirrors and slits. As applied in remote sensing, TSMFTIS needs to rely on push-broom of the flying platform to obtain the interferogram of the target detected, and if the moving state of the flying platform changed during the imaging process, the target interferogram picked up from the remote sensing image sequence will deviate from the ideal interferogram, then the target spectrum recovered shall not reflect the real characteristic of the ground target object. Therefore, in order to achieve a high precision spectrum recovery of the target detected, the geometry position of the target point on the TSMFTIS image surface can be calculated in accordance with the sub-pixel image registration method, and the real point interferogram of the target can be obtained with image interpolation method. The core idea of the interpolation methods (nearest, bilinear and cubic etc) are to obtain the grey value of the point to be interpolated by weighting the grey value of the pixel around and with the kernel function constructed by the distance between the pixel around and the point to be interpolated. This paper adopts the gauss-based kernel regression mode, present a kernel function that consists of the grey information making use of the relative deviation and the distance information, then the kernel function is controlled by the deviation degree between the grey value of the pixel around and the means value so as to adjust weights self adaptively. The simulation adopts the partial spectrum data obtained by the pushbroom hyperspectral imager (PHI) as the spectrum of the target, obtains the successively push broomed motion error image in combination with the related parameter of the actual aviation platform; then obtains the interferogram of the target point with the above interpolation method; finally, recovers spectrogram with the nonuniform fast
Yap, C W; Li, H; Ji, Z L; Chen, Y Z
2007-11-01
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models have been extensively used for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological property from structure-derived physicochemical and structural features. These models can be developed by using various regression methods including conventional approaches (multiple linear regression and partial least squares) and more recently explored genetic (genetic function approximation) and machine learning (k-nearest neighbour, neural networks, and support vector regression) approaches. This article describes the algorithms of these methods, evaluates their advantages and disadvantages, and discusses the application potential of the recently explored methods. Freely available online and commercial software for these regression methods and the areas of their applications are also presented. PMID:18045213
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Toplak, Marko; Močnik, Rok; Polajnar, Matija; Bosnić, Zoran; Carlsson, Lars; Hasselgren, Catrin; Demšar, Janez; Boyer, Scott; Zupan, Blaž; Stålring, Jonna
2014-02-24
The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite. PMID:24490838
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Accurate motion parameter estimation for colonoscopy tracking using a regression method
NASA Astrophysics Data System (ADS)
Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.
2010-03-01
Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.
Wang, Molin; Kuchiba, Aya; Ogino, Shuji
2015-01-01
In interdisciplinary biomedical, epidemiologic, and population research, it is increasingly necessary to consider pathogenesis and inherent heterogeneity of any given health condition and outcome. As the unique disease principle implies, no single biomarker can perfectly define disease subtypes. The complex nature of molecular pathology and biology necessitates biostatistical methodologies to simultaneously analyze multiple biomarkers and subtypes. To analyze and test for heterogeneity hypotheses across subtypes defined by multiple categorical and/or ordinal markers, we developed a meta-regression method that can utilize existing statistical software for mixed-model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by 1 marker while controlling for other markers and to evaluate whether the difference in exposure-subtype association across subtypes defined by 1 marker depends on any other markers. To illustrate this method in molecular pathological epidemiology research, we examined the associations between smoking status and colorectal cancer subtypes defined by 3 correlated tumor molecular characteristics (CpG island methylator phenotype, microsatellite instability, and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation) in the Nurses' Health Study (1980–2010) and the Health Professionals Follow-up Study (1986–2010). This method can be widely useful as molecular diagnostics and genomic technologies become routine in clinical medicine and public health. PMID:26116215
Environmental Conditions in Kentucky's Penal Institutions
ERIC Educational Resources Information Center
Bell, Irving
1974-01-01
A state task force was organized to identify health or environmental deficiencies existing in Kentucky penal institutions. Based on information gained through direct observation and inmate questionnaires, the task force concluded that many hazardous and unsanitary conditions existed, and recommended that immediate action be given to these…
NCAA Penalizes Fewer Teams than Expected
ERIC Educational Resources Information Center
Sander, Libby
2008-01-01
This article reports that the National Collegiate Athletic Association (NCAA) has penalized fewer teams than it expected this year over athletes' poor academic performance. For years, officials with the NCAA have predicted that strikingly high numbers of college sports teams could be at risk of losing scholarships this year because of their…
Zheng, R; Zhang, W; Li, Y; Huang, J; Yang, D
1998-02-01
The EDXRF extrapolate-regression method described in this paper combines regression method with the fundamental formula of fluorescence intensity. The contents of Ni and Pd in white karat gold jewellery were calculated theoretically according to the spectrum of the sample. The content of gold was deternined without standards. The precision was 0.1% and the deviation was 0.3% compared with AA. PMID:15810348
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. PMID:20645279
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Investigating the Accuracy of Three Estimation Methods for Regression Discontinuity Design
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2013-01-01
Regression discontinuity design is an alternative to randomized experiments to make causal inference when random assignment is not possible. This article first presents the formal identification and estimation of regression discontinuity treatment effects in the framework of Rubin's causal model, followed by a thorough literature review of…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
Penalized Spline: a General Robust Trajectory Model for ZIYUAN-3 Satellite
NASA Astrophysics Data System (ADS)
Pan, H.; Zou, Z.
2016-06-01
Owing to the dynamic imaging system, the trajectory model plays a very important role in the geometric processing of high resolution satellite imagery. However, establishing a trajectory model is difficult when only discrete and noisy data are available. In this manuscript, we proposed a general robust trajectory model, the penalized spline model, which could fit trajectory data well and smooth noise. The penalized parameter λ controlling the smooth and fitting accuracy could be estimated by generalized cross-validation. Five other trajectory models, including third-order polynomials, Chebyshev polynomials, linear interpolation, Lagrange interpolation and cubic spline, are compared with the penalized spline model. Both the sophisticated ephemeris and on-board ephemeris are used to compare the orbit models. The penalized spline model could smooth part of noise, and accuracy would decrease as the orbit length increases. The band-to-band misregistration of ZiYuan-3 Dengfeng and Faizabad multispectral images is used to evaluate the proposed method. With the Dengfeng dataset, the third-order polynomials and Chebyshev approximation could not model the oscillation, and introduce misregistration of 0.57 pixels misregistration in across-track direction and 0.33 pixels in along-track direction. With the Faizabad dataset, the linear interpolation, Lagrange interpolation and cubic spline model suffer from noise, introducing larger misregistration than the approximation models. Experimental results suggest the penalized spline model could model the oscillation and smooth noise.
Eng, K.; Milly, P.C.D.; Tasker, Gary D.
2007-01-01
To facilitate estimation of streamflow characteristics at an ungauged site, hydrologists often define a region of influence containing gauged sites hydrologically similar to the estimation site. This region can be defined either in geographic space or in the space of the variables that are used to predict streamflow (predictor variables). These approaches are complementary, and a combination of the two may be superior to either. Here we propose a hybrid region-of-influence (HRoI) regression method that combines the two approaches. The new method was applied with streamflow records from 1,091 gauges in the southeastern United States to estimate the 50-year peak flow (Q50). The HRoI approach yielded lower root-mean-square estimation errors and produced fewer extreme errors than either the predictor-variable or geographic region-of-influence approaches. It is concluded, for Q50 in the study region, that similarity with respect to the basin characteristics considered (area, slope, and annual precipitation) is important, but incomplete, and that the consideration of geographic proximity of stations provides a useful surrogate for characteristics that are not included in the analysis. ?? 2007 ASCE.
NASA Astrophysics Data System (ADS)
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ≈90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
Comparing implementations of penalized weighted least-squares sinogram restoration
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-11-15
Purpose: A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. Methods: The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix
Comparing implementations of penalized weighted least-squares sinogram restoration
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-01-01
Purpose: A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. Methods: The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix
Law, G.S.; Tasker, Gary D.
2003-01-01
The region-of-influence method and regional-regression equations are used to predict flood frequency of unregulated and ungaged rivers and streams of Tennessee. The prediction methods have been developed using strem-gage records from unregulated streams draining basins having 1-30% total impervious area. A computer application automates the calculation of the flood frequencies of the unregulated streams. Average deleted-residual prediction errors for the region-of-influence method are found to be slightly smaller than those for the regional regression methods.
1989-01-01
Austria's Federal act amending the Penal Code and the Code of Penal Procedure (Penal Code Amendments 1989), April 27, 1989, rewrites sections of the Penal Code relating to sexual crimes. Among other things, it makes these sections sex-neutral and criminalizes rape within marriage and cohabitation. Section 201 states that 1) whoever, by means of serious force or threat of actual serious danger to life or limb, compels a person to engage in sexual intercourse or an equivalent sexual act will be punished with imprisonment from 1 to 10 years. Rendering a person unconscious will be considered using serious force; 2) apart from the above subsection 1, whoever, by means of force or deprivation or personal freedom, or threat of actual danger to life or limb, compels a person to engage in sexual intercourse or an equivalent sexual act will be punished with imprisonment from 6 months to 5 years; and 3) specified circumstances will result in enhanced punishments. Section 202 states that 1) apart from the above Section 201, whoever by means of force or serious threat, compels a sexual act shall be punished with imprisonment for up to 3 years and 2) there will be enhanced punishments for special circumstances. Section 203 deals with perpetration of the crime in marriage or cohabitation, and states: 1) whoever perpetrates one of the acts described in Section 201 and Section 202 against a spouse or cohabiting partner will be prosecuted only upon the complaint of the injured party in so far as none of the results described in sections 201 or 202 occurs, and the criminal act contains none of the circumstances specified in those sections. Special commutation provisions are available when the injured party declares their wish to continue to live with the perpetrator. PMID:12344063
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326
Stefanello, C; Vieira, S L; Xue, P; Ajuwon, K M; Adeola, O
2016-07-01
A study was conducted to determine the ileal digestible energy (IDE), ME, and MEn contents of bakery meal using the regression method and to evaluate whether the energy values are age-dependent in broiler chickens from zero to 21 d post hatching. Seven hundred and eighty male Ross 708 chicks were fed 3 experimental diets in which bakery meal was incorporated into a corn-soybean meal-based reference diet at zero, 100, or 200 g/kg by replacing the energy-yielding ingredients. A 3 × 3 factorial arrangement of 3 ages (1, 2, or 3 wk) and 3 dietary bakery meal levels were used. Birds were fed the same experimental diets in these 3 evaluated ages. Birds were grouped by weight into 10 replicates per treatment in a randomized complete block design. Apparent ileal digestibility and total tract retention of DM, N, and energy were calculated. Expression of mucin (MUC2), sodium-dependent phosphate transporter (NaPi-IIb), solute carrier family 7 (cationic amino acid transporter, Y(+) system, SLC7A2), glucose (GLUT2), and sodium-glucose linked transporter (SGLT1) genes were measured at each age in the jejunum by real-time PCR. Addition of bakery meal to the reference diet resulted in a linear decrease in retention of DM, N, and energy, and a quadratic reduction (P < 0.05) in N retention and ME. There was a linear increase in DM, N, and energy as birds' ages increased from 1 to 3 wk. Dietary bakery meal did not affect jejunal gene expression. Expression of genes encoding MUC2, NaPi-IIb, and SLC7A2 linearly increased (P < 0.05) with age. Regression-derived MEn of bakery meal linearly increased (P < 0.05) as the age of birds increased, with values of 2,710, 2,820, and 2,923 kcal/kg DM for 1, 2, and 3 wk, respectively. Based on these results, utilization of energy and nitrogen in the basal diet decreased when bakery meal was included and increased with age of broiler chickens. PMID:26944962
Robust Gaussian Graphical Modeling via l1 Penalization
Sun, Hokeun; Li, Hongzhe
2012-01-01
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Bayesian Method for Support Union Recovery in Multivariate Multi-Response Linear Regression
NASA Astrophysics Data System (ADS)
Chen, Wan-Ping
Sparse modeling has become a particularly important and quickly developing topic in many applications of statistics, machine learning, and signal processing. The main objective of sparse modeling is discovering a small number of predictive patterns that would improve our understanding of the data. This paper extends the idea of sparse modeling to the variable selection problem in high dimensional linear regression, where there are multiple response vectors, and they share the same or similar subsets of predictor variables to be selected from a large set of candidate variables. In the literature, this problem is called multi-task learning, support union recovery or simultaneous sparse coding in different contexts. We present a Bayesian method for solving this problem by introducing two nested sets of binary indicator variables. In the first set of indicator variables, each indicator is associated with a predictor variable or a regressor, indicating whether this variable is active for any of the response vectors. In the second set of indicator variables, each indicator is associated with both a predicator variable and a response vector, indicating whether this variable is active for the particular response vector. The problem of variable selection is solved by sampling from the posterior distributions of the two sets of indicator variables. We develop a Gibbs sampling algorithm for posterior sampling and use the generated samples to identify active support both in shared and individual level. Theoretical and simulation justification are performed in the paper. The proposed algorithm is also demonstrated on the real image data sets. To learn the patterns of the object in images, we treat images as the different tasks. Through combining images with the object in the same category, we cannot only learn the shared patterns efficiently but also get individual sketch of each image.
Revisiting the Distance Duality Relation using a non-parametric regression method
NASA Astrophysics Data System (ADS)
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
A primer on regression methods for decoding cis-regulatory logic
Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.
2009-03-03
The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes. PMID:24416092
Analyzing Association Mapping in Pedigree-Based GWAS Using a Penalized Multitrait Mixed Model.
Liu, Jin; Yang, Can; Shi, Xingjie; Li, Cong; Huang, Jian; Zhao, Hongyu; Ma, Shuangge
2016-07-01
Genome-wide association studies (GWAS) have led to the identification of many genetic variants associated with complex diseases in the past 10 years. Penalization methods, with significant numerical and statistical advantages, have been extensively adopted in analyzing GWAS. This study has been partly motivated by the analysis of Genetic Analysis Workshop (GAW) 18 data, which have two notable characteristics. First, the subjects are from a small number of pedigrees and hence related. Second, for each subject, multiple correlated traits have been measured. Most of the existing penalization methods assume independence between subjects and traits and can be suboptimal. There are a few methods in the literature based on mixed modeling that can accommodate correlations. However, they cannot fully accommodate the two types of correlations while conducting effective marker selection. In this study, we develop a penalized multitrait mixed modeling approach. It accommodates the two different types of correlations and includes several existing methods as special cases. Effective penalization is adopted for marker selection. Simulation demonstrates its satisfactory performance. The GAW 18 data are analyzed using the proposed method. PMID:27247027
Nonlinear regression-based method for pseudoenhancement correction in CT colonography.
Tsagaan, Baigalmaa; Näppi, Janne; Yoshida, Hiroyuki
2009-08-01
In CT colonography (CTC), orally administered positive-contrast tagging agents are often used for differentiating residual bowel contents from native colonic structures. However, tagged materials can sometimes hyperattenuate observed CT numbers of their adjacent untagged materials. Such pseudoenhancement complicates the differentiation of colonic soft-tissue structures from tagged materials, because pseudoenhanced colonic structures may have CT numbers that are similar to those of tagged materials. The authors developed a nonlinear regression-based (NLRB) method for performing a local image-based pseudoenhancement correction of CTC data. To calibrate the correction parameters, the CT data of an anthropomorphic reference phantom were correlated with those of partially tagged phantoms. The CTC data were registered spatially by use of an adaptive multiresolution method, and untagged and tagged partial-volume soft-tissue surfaces were correlated by use of a virtual tagging scheme. The NLRB method was then optimized to minimize the difference in the CT numbers of soft-tissue regions between the untagged and tagged phantom CTC data by use of the Nelder-Mead downhill simplex method. To validate the method, the CT numbers of untagged regions were compared with those of registered pseudoenhanced phantom regions before and after the correction. The CT numbers were significantly different before performing the correction (p<0.01), whereas, after the correction, the difference between the CT numbers was not significant. The effect of the correction was also tested on the size measurement of polyps that were covered by tagging in phantoms and in clinical cases. In phantom cases, before the correction, the diameters of 12 simulated polyps submerged in tagged fluids that were measured in a soft-tissue CT display were significantly different from those measured in an untagged phantom (p<0.01), whereas after the correction the difference was not significant. In clinical cases
A penalized robust semiparametric approach for gene-environment interactions.
Wu, Cen; Shi, Xingjie; Cui, Yuehua; Ma, Shuangge
2015-12-30
In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G×E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to accommodate possible data contamination. Penalization, which has been extensively used with high-dimensional data, is adopted for selection. The proposed penalized estimation approach can automatically determine if a G factor has an interaction with an E factor, main effect but not interaction, or no effect at all. The proposed approach can be effectively realized using a coordinate descent algorithm. Simulation shows that it has satisfactory performance and outperforms several competing alternatives. The proposed approach is used to analyze a lung cancer study with gene expression measurements and clinical variables. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26239060
Estimating R-squared Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods.
ERIC Educational Resources Information Center
Yin, Ping; Fan, Xitao
2001-01-01
Studied the effectiveness of various analytical formulas for estimating "R" squared shrinkage in multiple regression analysis, focusing on estimators of the squared population multiple correlation coefficient and the squared population cross validity coefficient. Simulation results suggest that the most widely used Wherry (R. Wherry, 1931) formula…
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational…
Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method
ERIC Educational Resources Information Center
Rutkowski, Leslie; Zhou, Yan
2015-01-01
Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
NASA Astrophysics Data System (ADS)
Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Comparison of regression and time-series methods for synthesizing missing streamflow records
Beauchamp, J.J.; Downing, D.J.; Railsback, S.F. )
1989-10-01
Regression and time-series techniques have been used to synthesize and predict the stream flow at the Foresta Bridge gage from information at the upstream Pohono Bridge gage on the Merced River near Yosemite National Park. Using the available data from two time periods (calendar year 1979 and water year 1986), the authors evaluated the two techniques in their ability to model the variation in the observed flows and in their ability to predict stream flow at the Foresta Bridge gage for the 1979 time period with data from the 1986 time period. Both techniques produced reasonably good estimates and forecasts of the flow at the downstream gage. However, the regression model was found to have a significant amount of autocorrelation in the residuals, which the time-series model was able to eliminate. The time-series technique presented can be of great assistance in arriving at reasonable estimates of flow in data sets that have large missing portions of data.
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min. PMID:25266685
Solution of the linear regression problem using matrix correction methods in the l 1 metric
NASA Astrophysics Data System (ADS)
Gorelik, V. A.; Trembacheva (Barkalova), O. S.
2016-02-01
The linear regression problem is considered as an improper interpolation problem. The metric l 1 is used to correct (approximate) all the initial data. A probabilistic justification of this metric in the case of the exponential noise distribution is given. The original improper interpolation problem is reduced to a set of a finite number of linear programming problems. The corresponding computational algorithms are implemented in MATLAB.
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models
Elliott, Michael R.
2012-01-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create “data driven” weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical. PMID:23275683
Bolarinwa, O A; Adeola, O
2012-12-01
Digestible and metabolizable energy contents of feed ingredients for pigs can be determined by direct or indirect methods. There are situations when only the indirect approach is suitable and the regression method is a robust indirect approach. This study was conducted to compare the direct and regression methods for determining the energy value of wheat for pigs. Twenty-four barrows with an average initial BW of 31 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g wheat/kg plus minerals and vitamins (sole wheat) for the direct method, corn (Zea mays)-soybean (Glycine max) meal reference diet (RD), RD + 300 g wheat/kg, and RD + 600 g wheat/kg. The 3 corn-soybean meal diets were used for the regression method and wheat replaced the energy-yielding ingredients, corn and soybean meal, so that the same ratio of corn and soybean meal across the experimental diets was maintained. The wheat used was analyzed to contain 883 g DM, 15.2 g N, and 3.94 Mcal GE/kg. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d total but separate collection of feces and urine. The DE and ME for the sole wheat diet were 3.83 and 3.77 Mcal/kg DM, respectively. Because the sole wheat diet contained 969 g wheat/kg, these translate to 3.95 Mcal DE/kg DM and 3.89 Mcal ME/kg DM. The RD used for the regression approach yielded 4.00 Mcal DE and 3.91 Mcal ME/kg DM diet. Increasing levels of wheat in the RD linearly reduced (P < 0.05) DE and ME to 3.88 and 3.79 Mcal/kg DM diet, respectively. The regressions of wheat contribution to DE and ME in megacalories against the quantity of wheat DM intake in kilograms generated 3.96 Mcal DE and 3.88 Mcal ME/kg DM. In conclusion, values obtained for the DE and ME of wheat using the direct method (3.95 and 3.89 Mcal/kg DM) did not differ (0.78 < P < 0.89) from those obtained using the regression method (3.96 and 3.88 Mcal/kg DM). PMID:23365389
Regression to fuzziness method for estimation of remaining useful life in power plant components
NASA Astrophysics Data System (ADS)
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
The cross politics of Ecuador's penal state.
Garces, Chris
2010-01-01
This essay examines inmate "crucifixion protests" in Ecuador's largest prison during 2003-04. It shows how the preventively incarcerated-of whom there are thousands-managed to effectively denounce their extralegal confinement by embodying the violence of the Christian crucifixion story. This form of protest, I argue, simultaneously clarified and obscured the multiple layers of sovereign power that pressed down on urban crime suspects, who found themselves persecuted and forsaken both outside and within the space of the prison. Police enacting zero-tolerance policies in urban neighborhoods are thus a key part of the penal state, as are the politically threatened family members of the indicted, the sensationalized local media, distrustful neighbors, prison guards, and incarcerated mafia. The essay shows how the politico-theological performance of self-crucifixion responded to these internested forms of sovereign violence, and were briefly effective. The inmates' cross intervention hence provides a window into the way sovereignty works in the Ecuadorean penal state, drawing out how incarceration trends and new urban security measures interlink, and produce an array of victims. PMID:20662147
[Legal probation of juvenile offenders after release from penal reformative training].
Urbaniok, Frank; Rossegger, Astrid; Fegert, Jörg; Rubertus, Michael; Endrass, Jérôme
2007-01-01
Over recent years, there has been an increase in adolescent delinquency in Germany and Switzerland. In this context, the episodic character of the majority of adolescent delinquency is usually pointed out; however, numerous studies show high re-offending rates for released adolescents. The goal of this study is to examine the legal probation of juvenile delinquents after release from penal reformative training. In this study, the legal probation of adolescents committed to the AEA Uitikon, in the Canton of Zurich, between 1974 and 1986 was scrutinized by examining extracts from their criminal record as of 2003. The period of catamnesis was thus between 17 and 29 years. Overall, 71% of offenders reoffended, 29% with a violent or sexual offence. Bivariate logistic regression showed that the kind of offence committed had no influence on the probability of recidivism. If commitment to the AEA was due to a single offence (as opposed to serial offences), the risk of recidivism was reduced by 71% (OR=0.29). The results of the study show that young delinquents sentenced and committed to penal reformative training have a high recidivism risk. Furthermore, the results point out the importance of the evaluation of the offense-preventive efficacy of penal measures. PMID:17410929
Şentürk, Damla; Dalrymple, Lorien S.; Mu, Yi; Nguyen, Danh V.
2014-01-01
SUMMARY We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (a) the probability of cardiovascular events, a binary process and (b) the rate of events once the realization is positive - when the ‘hurdle’ is crossed - using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V
2014-11-10
We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H., Jr.
2015-01-01
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Heinze, Georg; Ploner, Meinhard; Beyea, Jan
2013-12-20
In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477
Using LASSO Regression to Predict Rheumatoid Arthritis Treatment Efficacy
Odgers, David J.; Tellis, Natalie; Hall, Heather; Dumontier, Michel
2016-01-01
Rheumatoid arthritis (RA) accounts for one-fifth of the deaths due to arthritis, the leading cause of disability in the United States. Finding effective treatments for managing arthritis symptoms are a major challenge, since the mechanisms of autoimmune disorders are not fully understood and disease presentation differs for each patient. The American College of Rheumatology clinical guidelines for treatment consider the severity of the disease when deciding treatment, but do not include any prediction of drug efficacy. Using Electronic Health Records and Biomedical Linked Open Data (LOD), we demonstrate a method to classify patient outcomes using LASSO penalized regression. We show how Linked Data improves prediction and provides insight into how drug treatment regimes have different treatment outcome. Applying classifiers like this to decision support in clinical applications could decrease time to successful disease management, lessening a physical and financial burden on patients individually and the healthcare system as a whole. PMID:27570666
Huang, Hai-Hui; Liu, Xiao-Ying; Liang, Yong
2016-01-01
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods. PMID:27136190
Pavlou, Menelaos; Ambler, Gareth; Seaman, Shaun; De Iorio, Maria; Omar, Rumana Z
2016-03-30
Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low-dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered-data settings and to incorporate external information. © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd. PMID:26514699
NASA Astrophysics Data System (ADS)
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.
1998-01-01
A key challenge in designing the new High Speed Civil Transport (HSCT) aircraft is determining a good match between the airframe and engine. Multidisciplinary design optimization can be used to solve the problem by adjusting parameters of both the engine and the airframe. Earlier, an example problem was presented of an HSCT aircraft with four mixed-flow turbofan engines and a baseline mission to carry 305 passengers 5000 nautical miles at a cruise speed of Mach 2.4. The problem was solved by coupling NASA Lewis Research Center's design optimization testbed (COMETBOARDS) with NASA Langley Research Center's Flight Optimization System (FLOPS). The computing time expended in solving the problem was substantial, and the instability of the FLOPS analyzer at certain design points caused difficulties. In an attempt to alleviate both of these limitations, we explored the use of two approximation concepts in the design optimization process. The two concepts, which are based on neural network and linear regression approximation, provide the reanalysis capability and design sensitivity analysis information required for the optimization process. The HSCT aircraft optimization problem was solved by using three alternate approaches; that is, the original FLOPS analyzer and two approximate (derived) analyzers. The approximate analyzers were calibrated and used in three different ranges of the design variables; narrow (interpolated), standard, and wide (extrapolated).
NASA Astrophysics Data System (ADS)
Arioli, M.; Gratton, S.
2012-11-01
Minimum-variance unbiased estimates for linear regression models can be obtained by solving least-squares problems. The conjugate gradient method can be successfully used in solving the symmetric and positive definite normal equations obtained from these least-squares problems. Taking into account the results of Golub and Meurant (1997, 2009) [10,11], Hestenes and Stiefel (1952) [17], and Strakoš and Tichý (2002) [16], which make it possible to approximate the energy norm of the error during the conjugate gradient iterative process, we adapt the stopping criterion introduced by Arioli (2005) [18] to the normal equations taking into account the statistical properties of the underpinning linear regression problem. Moreover, we show how the energy norm of the error is linked to the χ2-distribution and to the Fisher-Snedecor distribution. Finally, we present the results of several numerical tests that experimentally validate the effectiveness of our stopping criteria.
Du, Hongying; Hu, Zhide; Bazzoli, Andrea; Zhang, Yang
2011-01-01
The epidermal growth factor receptor (EGFR) protein tyrosine kinase (PTK) is an important protein target for anti-tumor drug discovery. To identify potential EGFR inhibitors, we conducted a quantitative structure–activity relationship (QSAR) study on the inhibitory activity of a series of quinazoline derivatives against EGFR tyrosine kinase. Two 2D-QSAR models were developed based on the best multi-linear regression (BMLR) and grid-search assisted projection pursuit regression (GS-PPR) methods. The results demonstrate that the inhibitory activity of quinazoline derivatives is strongly correlated with their polarizability, activation energy, mass distribution, connectivity, and branching information. Although the present investigation focused on EGFR, the approach provides a general avenue in the structure-based drug development of different protein receptor inhibitors. PMID:21811593
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies. PMID:25302582
A Probabilistic Spatial Dengue Fever Risk Assessment by a Threshold-Based-Quantile Regression Method
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies. PMID:25302582
NASA Astrophysics Data System (ADS)
Mathon, Bree R.; Ozbek, Metin M.; Pinder, George F.
2008-05-01
SummaryTraditionally the Cooper-Jacob equation is used to determine the transmissivity and the storage coefficient for an aquifer using pump test results. This model, however, is a simplified version of the actual subsurface and does not allow for analysis of the uncertainty that comes from a lack of knowledge about the heterogeneity of the environment under investigation. In this paper, a modified fuzzy least-squares regression (MFLSR) method is developed that uses imprecise pump test data to obtain fuzzy intercept and slope values which are then used in the Cooper-Jacob method. Fuzzy membership functions for the transmissivity and the storage coefficient are then calculated using the extension principle. The supports of the fuzzy membership functions incorporate the transmissivity and storage coefficient values that would be obtained using ordinary least-squares regression and the Cooper-Jacob method. The MFLSR coupled with the Cooper-Jacob method allows the analyst to ascertain the uncertainty that is inherent in the estimated parameters obtained using the simplified Cooper-Jacob method and data that are uncertain due to lack of knowledge regarding the heterogeneity of the aquifer.
Deng, Zhaohong; Choi, Kup-Sze; Jiang, Yizhang; Wang, Shitong
2014-12-01
Inductive transfer learning has attracted increasing attention for the training of effective model in the target domain by leveraging the information in the source domain. However, most transfer learning methods are developed for a specific model, such as the commonly used support vector machine, which makes the methods applicable only to the adopted models. In this regard, the generalized hidden-mapping ridge regression (GHRR) method is introduced in order to train various types of classical intelligence models, including neural networks, fuzzy logical systems and kernel methods. Furthermore, the knowledge-leverage based transfer learning mechanism is integrated with GHRR to realize the inductive transfer learning method called transfer GHRR (TGHRR). Since the information from the induced knowledge is much clearer and more concise than that from the data in the source domain, it is more convenient to control and balance the similarity and difference of data distributions between the source and target domains. The proposed GHRR and TGHRR algorithms have been evaluated experimentally by performing regression and classification on synthetic and real world datasets. The results demonstrate that the performance of TGHRR is competitive with or even superior to existing state-of-the-art inductive transfer learning algorithms. PMID:24710838
Xie, Benhuai; Pan, Wei; Shen, Xiaotong
2010-01-01
Motivation: Model-based clustering has been widely used, e.g. in microarray data analysis. Since for high-dimensional data variable selection is necessary, several penalized model-based clustering methods have been proposed tørealize simultaneous variable selection and clustering. However, the existing methods all assume that the variables are independent with the use of diagonal covariance matrices. Results: To model non-independence of variables (e.g. correlated gene expressions) while alleviating the problem with the large number of unknown parameters associated with a general non-diagonal covariance matrix, we generalize the mixture of factor analyzers to that with penalization, which, among others, can effectively realize variable selection. We use simulated data and real microarray data to illustrate the utility and advantages of the proposed method over several existing ones. Contact: weip@biostat.umn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20031967
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
ERIC Educational Resources Information Center
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test
Kim, Jinseog; Kim, Dennis (Dong Hwan); Jung, Sin-Ho
2013-01-01
One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity. PMID:24174989
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
NASA Astrophysics Data System (ADS)
Borodachev, S. M.
2016-06-01
The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.
Boundary integral equation method calculations of surface regression effects in flame spreading
NASA Technical Reports Server (NTRS)
Altenkirch, R. A.; Rezayat, M.; Eichhorn, R.; Rizzo, F. J.
1982-01-01
A solid-phase conduction problem that is a modified version of one that has been treated previously in the literature and is applicable to flame spreading over a pyrolyzing fuel is solved using a boundary integral equation (BIE) method. Results are compared to surface temperature measurements that can be found in the literature. In addition, the heat conducted through the solid forward of the flame, the heat transfer responsible for sustaining the flame, is also computed in terms of the Peclet number based on a heated layer depth using the BIE method and approximate methods based on asymptotic expansions. Agreement between computed and experimental results is quite good as is agreement between the BIE and the approximate results.
Technology Transfer Automated Retrieval System (TEKTRAN)
The beard testing method for measuring cotton fiber length is based on the fibrogram theory. However, in the instrumental implementations, the engineering complexity alters the original fiber length distribution observed by the instrument. This causes challenges in obtaining the entire original le...
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
NASA Astrophysics Data System (ADS)
Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.
2014-03-01
We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2015-07-01
In operational hydrology, estimation of the predictive uncertainty of hydrological models used for flood modelling is essential for risk-based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analysing and predicting uncertainty. However, studies devoted to comparing the performance of the methods in predicting uncertainty are limited. This paper focuses on the methods predicting model residual uncertainty that differ in methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC). The comparison of the methods is aimed at investigating how well a simpler method using fewer input data performs over a more complex method with more predictors. We test these two methods on several catchments from the UK that vary in hydrological characteristics and the models used. Special attention is given to the methods' performance under different hydrological conditions. Furthermore, normality of model residuals in data clusters (identified by UNEEC) is analysed. It is found that basin lag time and forecast lead time have a large impact on the quantification of uncertainty and the presence of normality in model residuals' distribution. In general, it can be said that both methods give similar results. At the same time, it is also shown that the UNEEC method provides better performance than QR for small catchments with the changing hydrological dynamics, i.e. rapid response catchments. It is recommended that more case studies of catchments of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features, be considered.
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface
Stojić, Andreja; Maletić, Dimitrije; Stanišić Stojić, Svetlana; Mijić, Zoran; Šoštarić, Andrej
2015-07-15
In this study, advanced multivariate methods were applied for VOC source apportionment and subsequent short-term forecast of industrial- and vehicle exhaust-related contributions in Belgrade urban area (Serbia). The VOC concentrations were measured using PTR-MS, together with inorganic gaseous pollutants (NOx, NO, NO2, SO2, and CO), PM10, and meteorological parameters. US EPA Positive Matrix Factorization and Unmix receptor models were applied to the obtained dataset both resolving six source profiles. For the purpose of forecasting industrial- and vehicle exhaust-related source contributions, different multivariate methods were employed in two separate cases, relying on meteorological data, and on meteorological data and concentrations of inorganic gaseous pollutants, respectively. The results indicate that Boosted Decision Trees and Multi-Layer Perceptrons were the best performing methods. According to the results, forecasting accuracy was high (lowest relative error of only 6%), in particular when the forecast was based on both meteorological parameters and concentrations of inorganic gaseous pollutants. PMID:25828408
NASA Astrophysics Data System (ADS)
Qu, Yonghua; Jiao, Siong; Lin, Xudong
2008-10-01
Hetao Irrigation District located in Inner Mongolia, is one of the three largest irrigated area in China. In the irrigational agriculture region, for the reasons that many efforts have been put on irrigation rather than on drainage, as a result much sedimentary salt that usually is solved in water has been deposited in surface soil. So there has arisen a problem in such irrigation district that soil salinity has become a chief fact which causes land degrading. Remote sensing technology is an efficiency way to map the salinity in regional scale. In the principle of remote sensing, soil spectrum is one of the most important indications which can be used to reflect the status of soil salinity. In the past decades, many efforts have been made to reveal the spectrum characteristics of the salinized soil, such as the traditional statistic regression method. But it also has been found that when the hyper-spectral reflectance data are considered, the traditional regression method can't be treat the large dimension data, because the hyper-spectral data usually have too higher spectral band number. In this paper, a partial least squares regression (PLSR) model was established based on the statistical analysis on the soil salinity and the reflectance of hyper-spectral. Dataset were collect through the field soil samples were collected in the region of Hetao irrigation from the end of July to the beginning of August. The independent validation using data which are not included in the calibration model reveals that the proposed model can predicate the main soil components such as the content of total ions(S%), PH with higher determination coefficients(R2) of 0.728 and 0.715 respectively. And the rate of prediction to deviation(RPD) of the above predicted value are larger than 1.6, which indicates that the calibrated PLSR model can be used as a tool to retrieve soil salinity with accurate results. When the PLSR model's regression coefficients were aggregated according to the
Yang, Jianli; Bai, Yang; Li, Guojun; Liu, Ming; Liu, Xiuling
2015-01-01
Premature ventricular contraction (PVC) is one of the most serious arrhythmias. Without early diagnosis and proper treatment, PVC can result in significant complications. In this paper, a novel feature extraction method based on a sparse auto-encoder (SAE) and softmax regression (SR) classifier was used to differentiate PVCs from other common Non-PVC rhythms, including normal sinus (N), left bundle branch block (LBBB), right bundle branch block (RBBB), atrial premature contraction (APC), and paced beat (PB) rhythms. The proposed method was analyzed using 40 ECG records obtained from the MIT-BIH Arrhythmia Database. The proposed method exhibited an overall accuracy of 99.4%, with a PVC recognition sensitivity and positive predictability of 97.9% and 91.8%, respectively. PMID:26405919
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2014-09-01
In operational hydrology, estimation of predictive uncertainty of hydrological models used for flood modelling is essential for risk based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analyzing and predicting uncertainty. However, case studies comparing performance of these methods, most particularly predictive uncertainty methods, are limited. This paper focuses on two predictive uncertainty methods that differ in their methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC), aiming at identifying possible advantages and disadvantages of these methods (both estimating residual uncertainty) based on their comparative performance. We test these two methods on several catchments (from UK) that vary in its hydrological characteristics and models. Special attention is given to the errors for high flow/water level conditions. Furthermore, normality of model residuals is discussed in view of clustering approach employed within the framework of UNEEC method. It is found that basin lag time and forecast lead time have great impact on quantification of uncertainty (in the form of two quantiles) and achievement of normality in model residuals' distribution. In general, uncertainty analysis results from different case studies indicate that both methods give similar results. However, it is also shown that UNEEC method provides better performance than QR for small catchments with changing hydrological dynamics, i.e. rapid response catchments. We recommend that more case studies of catchments from regions of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features be tested.
Lin, Wei; Feng, Rui; Li, Hongzhe
2014-01-01
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
NASA Astrophysics Data System (ADS)
Liu, Jiaqi; Han, Jing; Zhang, Yi; Bai, Lianfa
2015-10-01
Locally adaptive regression kernels model can describe the edge shape of images accurately and graphic trend of images integrally, but it did not consider images' color information while the color is an important element of an image. Therefore, we present a novel method of target recognition based on 3-D-color-space locally adaptive regression kernels model. Different from the general additional color information, this method directly calculate the local similarity features of 3-D data from the color image. The proposed method uses a few examples of an object as a query to detect generic objects with incompact, complex and changeable shapes. Our method involves three phases: First, calculating the novel color-space descriptors from the RGB color space of query image which measure the likeness of a voxel to its surroundings. Salient features which include spatial- dimensional and color -dimensional information are extracted from said descriptors, and simplifying them to construct a non-similar local structure feature set of the object class by principal components analysis (PCA). Second, we compare the salient features with analogous features from the target image. This comparison is done using a matrix generalization of the cosine similarity measure. Then the similar structures in the target image are obtained using local similarity structure statistical matching. Finally, we use the method of non-maxima suppression in the similarity image to extract the object position and mark the object in the test image. Experimental results demonstrate that our approach is effective and accurate in improving the ability to identify targets.
Strong, Mark; Oakley, Jeremy E; Brennan, Alan; Breeze, Penny
2015-07-01
Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method. PMID:25810269
NASA Astrophysics Data System (ADS)
Bell, A. L.; Moore, J. N.; Greenwood, M. C.
2007-12-01
The Flathead River in Northwestern Montana drains the relatively pristine, high-mountain watersheds of Glacier- Waterton national parks and large wilderness areas making it an excellent test-bed for hydrologic response to climate change. Flows in the North Fork and Middle Fork of the Flathead River are relatively unmodified by humans, whereas the South Fork has a large hydroelectric reservoir (Hungry Horse) in the lower end of the basin. USGS stream gage data for the North, Middle and South forks from 1940 to 2006 were analyzed for significant trends in the timing of quantiles of flow to examine climate forcing vs. direct modification of flow from the dam. The trends in timing were analyzed for climate change influences using the PRISM model output for 1940 to 2006 for the respective basin. The analysis of trends in timing employed two linear regression methods, typical least squares estimation and robust estimation using weighted least squares. Least squares estimation is the standard method employed when performing regression analysis. The power of this method is sensitive to the violation of the assumptions of normally distributed errors with constant variance (homoscedasticity). Considering that violations of these assumptions are common in hydrologic data, robust estimation was used to preserve the desired statistical power because it is not significantly affected by non-normality or heteroscedasticity. Least squares estimated trends that were found to be significant, using a 10% significance level, were typically not significant using a robust estimation method. This could have implications for interpreting the meaning of significant trends found using the least squares estimator. Utilizing robust estimation methods for analyzing hydrologic data may allow investigators to more accurately summarize any trends.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
NASA Astrophysics Data System (ADS)
Zobitz, J. M.; Burns, S. P.; OgéE, J.; Reichstein, M.; Bowling, D. R.
2007-09-01
Separation of the net ecosystem exchange of CO2 (F) into its component fluxes of net photosynthesis (FA) and nonfoliar respiration (FR) is important in understanding the physical and environmental controls on these fluxes, and how these fluxes may respond to environmental change. In this paper, we evaluate a partitioning method based on a combination of stable isotopes of CO2 and Bayesian optimization in the context of partitioning methods based on regressions with environmental variables. We combined high-resolution measurements of stable carbon isotopes of CO2, ecosystem fluxes, and meteorological variables with a Bayesian parameter optimization approach to estimate FA and FR in a subalpine forest in Colorado, United States, over the course of 104 days during summer 2003. Results were generally in agreement with the independent environmental regression methods of Reichstein et al. (2005a) and Yi et al. (2004). Half-hourly posterior parameter estimates of FA and FR derived from the Bayesian/isotopic method showed a strong diurnal pattern in both, consistent with established gross photosynthesis (GEE) and total ecosystem respiration (TER) relationships. Isotope-derived FA was functionally dependent on light, but FR exhibited the expected temperature dependence only when the prior estimates for FR were temperature-based. Examination of the posterior correlation matrix revealed that the available data were insufficient to independently resolve all the Bayesian-estimated parameters in our model. This could be due to a small isotopic disequilibrium (?) between FA and FR, poor characterization of whole-canopy photosynthetic discrimination or the isotopic flux (isoflux, analogous to net ecosystem exchange of 13CO2). The positive sign of ? indicates that FA was more enriched in 13C than FR. Possible reasons for this are discussed in the context of recent literature.
Numerical modeling of flexible insect wings using volume penalization
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Joern
2012-11-01
We consider the effects of chordwise flexibility on the aerodynamic performance of insect flapping wings. We developed a numerical method for modeling viscous fluid flows past moving deformable foils. It extends on the previously reported model for flows past moving rigid wings (J Comput Phys 228, 2009). The two-dimensional Navier-Stokes equations are solved using a Fourier pseudo-spectral method with the no-slip boundary conditions imposed by the volume penalization method. The deformable wing section is modeled using a non-linear beam equation. We performed numerical simulations of heaving flexible plates. The results showed that the optimal stroke frequency, which maximizes the mean thrust, is lower than the resonant frequency, in agreement with the experiments by Ramananarivo et al. (PNAS 108(15), 2011). The oscillatory part of the force only increases in amplitude when the frequency increases, and at the optimal frequency it is about 3 times larger than the mean force. We also study aerodynamic interactions between two heaving flexible foils. This flow configuration corresponds to the wings of dragonflies. We explore the effects of the phase difference and spacing between the fore- and hind-wing.
Zhang, L; Liu, X J
2016-01-01
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations. PMID:27323111
Swartz, Michael D.; Yu, Robert K.; Shete, Sanjay
2011-01-01
When modeling the risk of a disease, the very act of selecting the factors to include can heavily impact the results. This study compares the performance of several variable selection techniques applied to logistic regression. We performed realistic simulation studies to compare five methods of variable selection: (1) a confidence interval approach for significant coefficients (CI), (2) backward selection, (3) forward selection, (4) stepwise selection, and (5) Bayesian stochastic search variable selection (SSVS) using both informed and uniformed priors. We defined our simulated diseases mimicking odds ratios for cancer risk found in the literature for environmental factors, such as smoking; dietary risk factors, such as fiber; genetic risk factors such as XPD; and interactions. We modeled the distribution of our covariates, including correlation, after the reported empirical distributions of these risk factors. We also used a null data set to calibrate the priors of the Bayesian method and evaluate its sensitivity. Of the standard methods (95% CI, backward, forward and stepwise selection) the CI approach resulted in the highest average percent of correct associations and lowest average percent of incorrect associations. SSVS with an informed prior had higher average percent of correct associations and lower average percent of incorrect associations than did the CI approach. This study shows that Bayesian methods offer a way to use prior information to both increase power and decrease false-positive results when selecting factors to model complex disease risk. PMID:18937224
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Li, J.; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models.
Michelot, Théo; Langrock, Roland; Kneib, Thomas; King, Ruth
2016-01-01
We discuss the semiparametric modeling of mark-recapture-recovery data where the temporal and/or individual variation of model parameters is explained via covariates. Typically, in such analyses a fixed (or mixed) effects parametric model is specified for the relationship between the model parameters and the covariates of interest. In this paper, we discuss the modeling of the relationship via the use of penalized splines, to allow for considerably more flexible functional forms. Corresponding models can be fitted via numerical maximum penalized likelihood estimation, employing cross-validation to choose the smoothing parameters in a data-driven way. Our contribution builds on and extends the existing literature, providing a unified inferential framework for semiparametric mark-recapture-recovery models for open populations, where the interest typically lies in the estimation of survival probabilities. The approach is applied to two real datasets, corresponding to gray herons (Ardea cinerea), where we model the survival probability as a function of environmental condition (a time-varying global covariate), and Soay sheep (Ovis aries), where we model the survival probability as a function of individual weight (a time-varying individual-specific covariate). The proposed semiparametric approach is compared to a standard parametric (logistic) regression and new interesting underlying dynamics are observed in both cases. PMID:26289495
NASA Astrophysics Data System (ADS)
Thomas, G. E.; Bardeen, C.; Benze, S.
2014-12-01
Simulations of Polar Mesospheric Cloud (PMC) brightness and ice water content (IWC) are used to develop a simple robust method for IWC retrieval from UV satellite observations. We compare model simulations of IWC with retrievals from the UV Cloud Imaging and Particle Size (CIPS) experiment on board the satellite mission Aeronomy for Ice in the Mesosphere (AIM). This instrument remotely senses scattered brightness related to the vertically-integrated ice content. Simulations from the Whole Atmosphere Community Climate Model (WACCM), a chemistry climate model, is combined with a sectional microphysics model based on the Community Aerosol and Radiation Model for Atmospheres (CARMA). The model calculates high-resolution three-dimensional size distributions of ice particles. The internal variability is due to geographic and temporal variation of temperature and dynamics, water vapor, and meteoric dust. We examine all simulations from a single model day (we chose northern summer solstice) which contains several thousand model clouds. Accurate vertical integrations of the albedo and IWC are obtained. The ice size distributions are thus based on physical principles, rather than artificial analytic distributions that are often used in retrieval algorithms from observations. Treating the model clouds as noise-free data, we apply the CIPS algorithm to retrieve cloud particle size and IWC. The inherent "errors" in the retrievals are thus estimated. The linear dependence of IWC on albedo makes possible a method to derive IWC, called the Albedo-Ice regression method, or AIR. This method potentially unifies the variety of data from various UV experiments, with the advantages of (1) removing scattering-angle bias from cloud brightness measurements,(2) providing a physically-useful parameter (IWC),(3) deriving IWC even for faint clouds of small average particle sizes, and (4) estimating the statistical uncertainty as a random error, which bypasses the need to derive particle size.
Rousselot, J M; Peslin, R; Duvivier, C
1992-07-01
A potentially useful method to monitor respiratory mechanics in artificially ventilated patients consists of analyzing the relationship between tracheal pressure (P), lung volume (V), and gas flow (V) by multiple linear regression (MLR) using a suitable model. Contrary to other methods, it does not require any particular flow waveform and, therefore, may be used with any ventilator. This approach was evaluated in three neonates and seven young children admitted into an intensive care unit for respiratory disorders of various etiologies. P and V were measured and digitized at a sampling rate of 40 Hz for periods of 20-48 s. After correction of P for the non-linear resistance of the endotracheal tube, the data were first analyzed with the usual linear monoalveolar model: P = PO + E.V + R.V where E and R are total respiratory elastance and resistance, and PO is the static recoil pressure at end-expiration. A good fit of the model to the data was seen in five of ten children. PO, E, and R were reproducible within cycles, and consistent with the patient's age and condition; the data obtained with two ventilatory modes were highly correlated. In the five instances in which the simple model did not fit the data well, they were reanalyzed with more sophisticated models allowing for mechanical non-homogeneity or for non-linearity of R or E. While several models substantially improved the fit, physiologically meaningful results were only obtained when R was allowed to change with lung volume. We conclude that the MLR method is adequate to monitor respiratory mechanics, even when the usual model is inadequate. PMID:1437330
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches. PMID:24621302
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P < 0.05) DM, ileal and total tract of DM, energy and nitrogen digestibilities and utilization. In Exp. 2, addition of A-V blend to the RD linearly increased (P < 0.001) ileal digestibilities and total tract utilization of DM, energy and nitrogen as well as IDE, ME, and MEn. Regressions of PBM-associated IDE, ME, or MEn intake in kcal against PBM intake were: IDE = 3,537x + 4.953, r(2) = 0.97; ME = 3,805x + 1.279, r(2) = 0.97; MEn = 3,278x + 0.164, r(2) = 0.90; and A-V blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124. PMID:26628339
Education--Penal Institutions: U. S. and Europe.
ERIC Educational Resources Information Center
Kerle, Ken
Penal systems of European countries vary in educational programs and humanizing efforts. A high percentage of Soviet prisoners, many incarcerated for ideological/religious beliefs, are confined to labor colonies. All inmates are obligated to learn a trade, one of the qualifications for release being evidence of some trade skill. Swedish…
Crime and Punishment: Are Copyright Violators Ever Penalized?
ERIC Educational Resources Information Center
Russell, Carrie
2004-01-01
Is there a Web site that keeps track of copyright Infringers and fines? Some colleagues don't believe that copyright violators are ever penalized. This question was asked by a reader in a question and answer column of "School Library Journal". Carrie Russell is the American Library Association's copyright specialist, and she will answer selected…
Multivariate Regression with Calibration*
Liu, Han; Wang, Lie; Zhao, Tuo
2014-01-01
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data. PMID:21556848
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
Bolarinwa, O A; Adeola, O
2016-02-01
Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( < 0.05) dietary DE and ME. The direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 < < 0.4) from those obtained using the regression method (3,669 vs. 3,746 and 3,593 vs. 3,647 kcal
A Small-Sample Choice of the Tuning Parameter in Ridge Regression
Boonstra, Philip S.; Mukherjee, Bhramar; Taylor, Jeremy M. G.
2015-01-01
We propose new approaches for choosing the shrinkage parameter in ridge regression, a penalized likelihood method for regularizing linear regression coefficients, when the number of observations is small relative to the number of parameters. Existing methods may lead to extreme choices of this parameter, which will either not shrink the coefficients enough or shrink them by too much. Within this “small-n, large-p” context, we suggest a correction to the common generalized cross-validation (GCV) method that preserves the asymptotic optimality of the original GCV. We also introduce the notion of a “hyperpenalty”, which shrinks the shrinkage parameter itself, and make a specific recommendation regarding the choice of hyperpenalty that empirically works well in a broad range of scenarios. A simple algorithm jointly estimates the shrinkage parameter and regression coefficients in the hyperpenalized likelihood. In a comprehensive simulation study of small-sample scenarios, our proposed approaches offer superior prediction over nine other existing methods. PMID:26985140
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
ERIC Educational Resources Information Center
Kromrey, Jeffrey D.; Hines, Constance V.
1996-01-01
The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)
ERIC Educational Resources Information Center
Hick, Thomas L.; Irvine, David J.
To eliminate maturation as a factor in the pretest-posttest design, pretest scores can be converted to anticipate posttest scores using grade equivalent scores from standardized tests. This conversion, known as historical regression, assumes that without specific intervention, growth will continue at the rate (grade equivalents per year of…
Wang, Yuanjia
2011-01-01
Longitudinal data are routinely collected in biomedical research studies. A natural model describing longitudinal data decomposes an individual’s outcome as the sum of a population mean function and random subject-specific deviations. When parametric assumptions are too restrictive, methods modeling the population mean function and the random subject-specific functions nonparametrically are in demand. In some applications, it is desirable to estimate a covariance function of random subject-specific deviations. In this work, flexible yet computationally efficient methods are developed for a general class of semiparametric mixed effects models, where the functional forms of the population mean and the subject-specific curves are unspecified. We estimate nonparametric components of the model by penalized spline (P-spline, [1]), and reparametrize the random curve covariance function by a modified Cholesky decomposition [2] which allows for unconstrained estimation of a positive semidefinite matrix. To provide smooth estimates, we penalize roughness of fitted curves and derive closed form solutions in the maximization step of an EM algorithm. In addition, we present models and methods for longitudinal family data where subjects in a family are correlated and we decompose the covariance function into a subject-level source and observation-level source. We apply these methods to the multi-level Framingham Heart Study data to estimate age-specific heritability of systolic blood pressure (SBP) nonparametrically. PMID:21491474
Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection
Bradic, Jelena; Fan, Jianqing; Wang, Weiwei
2011-01-01
Summary In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples. PMID:21589849
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
NASA Astrophysics Data System (ADS)
Demir, Begüm; Bruzzone, Lorenzo
2012-11-01
This paper presents a novel active learning (AL) technique in the context of ɛ-insensitive support vector regression (SVR) to estimate biophysical parameters from remotely sensed images. The proposed AL method aims at selecting the most informative and representative unlabeled samples which have maximum uncertainty, diversity and density assessed according to the SVR estimation rule. This is achieved on the basis of two consecutive steps that rely on the kernel kmeans clustering. In the first step the most uncertain unlabeled samples are selected by removing the most certain ones from a pool of unlabeled samples. In SVR problems, the most uncertain samples are located outside or on the boundary of the ɛ-tube of SVR, as their target values have the lowest confidence to be correctly estimated. In order to select these samples, the kernel k-means clustering is applied to all unlabeled samples together with the training samples that are not SVs, i.e., those that are inside the ɛ-tube, (non-SVs). Then, clusters with non-SVs inside are rejected, whereas the unlabeled samples contained in the remained clusters are selected as the most uncertain samples. In the second step the samples located in the high density regions in the kernel space and as much diverse as possible to each other are chosen among the uncertain samples. The density and diversity of the unlabeled samples are evaluated on the basis of their clusters' information. To this end, initially the density of each cluster is measured by the ratio of the number of samples in the cluster to the distance of its two furthest samples. Then, the highest density clusters are chosen and the medoid samples closest to the centers of the selected clusters are chosen as the most informative ones. The diversity of samples is accomplished by selecting only one sample from each selected cluster. Experiments applied to the estimation of single-tree parameters, i.e., tree stem volume and tree stem diameter, show the
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method's prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student's t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method was
Survival associated pathway identification with group Lp penalized global AUC maximization
2010-01-01
It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties [1]. We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times. PMID:20712896
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method’s prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student’s t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method
Clegg, Samuel M; Barefield, James E; Wiens, Roger C; Dyar, Melinda D; Schafer, Martha W; Tucker, Jonathan M
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
Zhu, Ying; Tan, Tuck Lee
2016-04-15
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects. PMID:26827180
NASA Astrophysics Data System (ADS)
Zhu, Ying; Tan, Tuck Lee
2016-04-01
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
[Guideline 'Medicinal care for drug addicts in penal institutions'].
Westra, Michel; de Haan, Hein A; Arends, Marleen T; van Everdingen, Jannes J E; Klazinga, Niek S
2009-01-01
In the Netherlands, the policy on care for prisoners who are addicted to opiates is still heterogeneous. The recent guidelines entitled 'Medicinal care for drug addicts in penal institutions' should contribute towards unambiguous and more evidence-based treatment for this group. In addition, it should improve and bring the care pathways within judicial institutions and mainstream healthcare more into line with one another. Each rational course of medicinal treatment will initially be continued in the penal institution. In penal institutions the help on offer is mainly focused on abstinence from illegal drugs while at the same time limiting the damage caused to the health of the individual user. Methadone is regarded at the first choice for maintenance therapy. For patient safety, this is best given in liquid form in sealed cups of 5 mg/ml once daily in the morning. Recently a combination preparation containing buprenorphine and naloxone - a complete opiate antagonist - has become available. On discontinuation of opiate maintenance treatment intensive follow-up care is necessary. During this period there is considerable risk of a potentially lethal overdose. Detoxification should be coupled with psychosocial or medicinal intervention aimed at preventing relapse. Naltrexone is currently the only available opiate antagonist for preventing relapse. In those addicted to opiates, who also take benzodiazepines without any indication, it is strongly recommended that these be reduced and discontinued. This can be achieved by converting the regular dosage into the equivalent in diazepam and then reducing this dosage by a maximum of 25% a week. PMID:20051159
Linear regression in astronomy. I
NASA Technical Reports Server (NTRS)
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
A penalization technique to model plasma facing components in a tokamak with temperature variations
Paredes, A.; Bufferand, H.; Ciraolo, G.; Schwander, F.; Serre, E.; Ghendrih, P.; Tamain, P.
2014-10-01
To properly address turbulent transport in the edge plasma region of a tokamak, it is mandatory to describe the particle and heat outflow on wall components, using an accurate representation of the wall geometry. This is challenging for many plasma transport codes, which use a structured mesh with one coordinate aligned with magnetic surfaces. We propose here a penalization technique that allows modeling of particle and heat transport using such structured mesh, while also accounting for geometrically complex plasma-facing components. Solid obstacles are considered as particle and momentum sinks whereas ionic and electronic temperature gradients are imposed on both sides of the obstacles along the magnetic field direction using delta functions (Dirac). Solutions exhibit plasma velocities (M=1) and temperatures fluxes at the plasma–wall boundaries that match with boundary conditions usually implemented in fluid codes. Grid convergence and error estimates are found to be in agreement with theoretical results obtained for neutral fluid conservation equations. The capability of the penalization technique is illustrated by introducing the non-collisional plasma region expected by the kinetic theory in the immediate vicinity of the interface, that is impossible when considering fluid boundary conditions. Axisymmetric numerical simulations show the efficiency of the method to investigate the large-scale transport at the plasma edge including the separatrix and in realistic complex geometries while keeping a simple structured grid.