Liu, Jin; Wang, Kai; Ma, Shuangge; Huang, Jian
2013-01-01
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with LASSO and MCP approaches are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using heterogeneous stock mice data and a rheumatoid arthritis. PMID:25258655
Gentry, Amanda Elswick; Jackson-Cook, Colleen K; Lyon, Debra E; Archer, Kellie J
2015-01-01
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment. PMID:26052223
Pineda, Silvia; Real, Francisco X.; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J.
2015-01-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Penalized solutions to functional regression problems
Harezlak, Jaroslaw; Coull, Brent A.; Laird, Nan M.; Magari, Shannon R.; Christiani, David C.
2007-01-01
SUMMARY Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the bases function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis functions coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on the heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of particulate matter corresponding to smoking breaks. PMID:18552972
Penalized solutions to functional regression problems.
Harezlak, Jaroslaw; Coull, Brent A; Laird, Nan M; Magari, Shannon R; Christiani, David C
2007-06-15
Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the bases function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L(2)-norm of the basis functions coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on the heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of particulate matter corresponding to smoking breaks. PMID:18552972
Multi-locus Association Testing with Penalized Regression
Basu, Saonli; Pan, Wei; Shen, Xiaotong; Oetting, William S.
2012-01-01
In multi-locus association analysis, since some markers may not be associated with a trait, it seems attractive to use penalized regression with the capability of automatic variable selection. On the other hand, in spite of a rapidly growing body of literature on penalized regression, most focus on variable selection and outcome prediction, for which penalized methods are generally more effective than their non-penalized counterparts. However, for statistical inference, i.e. hypothesis testing and interval estimation, it is less clear how penalized methods would perform, or even how to best apply them, largely due to lack of studies on this topic. In our motivating data for a cohort of kidney transplant recipients, it is of primary interest to assess whether a group of genetic variants are associated with a binary clinical outcome, acute rejection at 6 months. In this paper, we study some technical issues and alternative implementations of hypothesis testing in Lasso penalized logistic regression, and compare their performance with each other and with several existing global tests, some of which are specifically designed as variance component tests for high-dimensional data. The most interesting, and perhaps surprising, conclusion of this study is that, for low to moderately high-dimensional data, statistical tests based on Lasso penalized regression are not necessarily more powerful than some existing global tests. In addition, in penalized regression, rather than building a test based on a single selected “best” model, combining multiple tests, each of which is built on a candidate model, might be more promising. PMID:21922539
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Penalized count data regression with application to hospital stay after pediatric cardiac surgery
Wang, Zhu; Ma, Shuangge; Zappitelli, Michael; Parikh, Chirag; Wang, Ching-Yun; Devarajan, Prasad
2014-01-01
Pediatric cardiac surgery may lead to poor outcomes such as acute kidney injury (AKI) and prolonged hospital length of stay (LOS). Plasma and urine biomarkers may help with early identification and prediction of these adverse clinical outcomes. In a recent multi-center study, 311 children undergoing cardiac surgery were enrolled to evaluate multiple biomarkers for diagnosis and prognosis of AKI and other clinical outcomes. LOS is often analyzed as count data, thus Poisson regression and negative binomial (NB) regression are common choices for developing predictive models. With many correlated prognostic factors and biomarkers, variable selection is an important step. The present paper proposes new variable selection methods for Poisson and NB regression. We evaluated regularized regression through penalized likelihood function. We first extend the elastic net (Enet) Poisson to two penalized Poisson regression: Mnet, a combination of minimax concave and ridge penalties; and Snet, a combination of smoothly clipped absolute deviation (SCAD) and ridge penalties. Furthermore, we extend the above methods to the penalized NB regression. For the Enet, Mnet, and Snet penalties (EMSnet), we develop a unified algorithm to estimate the parameters and conduct variable selection simultaneously. Simulation studies show that the proposed methods have advantages with highly correlated predictors, against some of the competing methods. Applying the proposed methods to the aforementioned data, it is discovered that early postoperative urine biomarkers including NGAL, IL18, and KIM-1 independently predict LOS, after adjusting for risk and biomarker variables. PMID:24742430
Penalized regression procedures for variable selection in the potential outcomes framework
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.
2015-01-01
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N.; Guan, Weihua; Kang, Jian; Li, Yun
2016-01-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717
PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data
Hoffman, Gabriel E.; Logsdon, Benjamin A.; Mezey, Jason G.
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one
PUMA: a unified framework for penalized multiple regression analysis of GWAS data.
Hoffman, Gabriel E; Logsdon, Benjamin A; Mezey, Jason G
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one
2014-01-01
Motivation It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. Results We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. Conclusion We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data. PMID:25559769
On penalized likelihood estimation for a non-proportional hazards regression model.
Devarajan, Karthik; Ebrahimi, Nader
2013-07-01
In this paper, a semi-parametric generalization of the Cox model that permits crossing hazard curves is described. A theoretical framework for estimation in this model is developed based on penalized likelihood methods. It is shown that the optimal solution to the baseline hazard, baseline cumulative hazard and their ratio are hyperbolic splines with knots at the distinct failure times. PMID:24791034
Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon
2015-01-01
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration. PMID:26346893
Genomewide Multiple-Loci Mapping in Experimental Crosses by Iterative Adaptive Penalized Regression
Sun, Wei; Ibrahim, Joseph G.; Zou, Fei
2010-01-01
Genomewide multiple-loci mapping can be viewed as a challenging variable selection problem where the major objective is to select genetic markers related to a trait of interest. It is challenging because the number of genetic markers is large (often much larger than the sample size) and there is often strong linkage or linkage disequilibrium between markers. In this article, we developed two methods for genomewide multiple loci mapping: the Bayesian adaptive Lasso and the iterative adaptive Lasso. Compared with eight existing methods, the proposed methods have improved variable selection performance in both simulation and real data studies. The advantages of our methods come from the assignment of adaptive weights to different genetic makers and the iterative updating of these adaptive weights. The iterative adaptive Lasso is also computationally much more efficient than the commonly used marginal regression and stepwise regression methods. Although our methods are motivated by multiple-loci mapping, they are general enough to be applied to other variable selection problems. PMID:20157003
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2009-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape and topology optimization of fluid flows. Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid in complex geometries (where grid resolution varies both in space and time time), while Brinkman volume penalization allows easy variation of flow geometry without using body-fitted meshes by simply changing the shape of the penalization region. The use of Brinkman volume penalization approach allow seamless transition from shape to topology optimization by combining it with level set approach and increasing the size of the optimization space. The approach is demonstrated for shape optimization of a variety of fluid flows by optimizing single cost function (time averaged Drag coefficient) using covariance matrix adaptation (CMA) evolutionary algorithm.
Regression methods for spatial data
NASA Technical Reports Server (NTRS)
Yakowitz, S. J.; Szidarovszky, F.
1982-01-01
The kriging approach, a parametric regression method used by hydrologists and mining engineers, among others also provides an error estimate the integral of the regression function. The kriging method is explored and some of its statistical characteristics are described. The Watson method and theory are extended so that the kriging features are displayed. Theoretical and computational comparisons of the kriging and Watson approaches are offered.
Shang, Shang; Bai, Jing; Song, Xiaolei; Wang, Hongkai; Lau, Jaclyn
2007-01-01
Conjugate gradient method is verified to be efficient for nonlinear optimization problems of large-dimension data. In this paper, a penalized linear and nonlinear combined conjugate gradient method for the reconstruction of fluorescence molecular tomography (FMT) is presented. The algorithm combines the linear conjugate gradient method and the nonlinear conjugate gradient method together based on a restart strategy, in order to take advantage of the two kinds of conjugate gradient methods and compensate for the disadvantages. A quadratic penalty method is adopted to gain a nonnegative constraint and reduce the illposedness of the problem. Simulation studies show that the presented algorithm is accurate, stable, and fast. It has a better performance than the conventional conjugate gradient-based reconstruction algorithms. It offers an effective approach to reconstruct fluorochrome information for FMT. PMID:18354740
Differentiating among penal states.
Lacey, Nicola
2010-12-01
This review article assesses Loïc Wacquant's contribution to debates on penality, focusing on his most recent book, Punishing the Poor: The Neoliberal Government of Social Insecurity (Wacquant 2009), while setting its argument in the context of his earlier Prisons of Poverty (1999). In particular, it draws on both historical and comparative methods to question whether Wacquant's conception of 'the penal state' is adequately differentiated for the purposes of building the explanatory account he proposes; about whether 'neo-liberalism' has, materially, the global influence which he ascribes to it; and about whether, therefore, the process of penal Americanization which he asserts in his recent writings is credible. PMID:21138432
NASA Astrophysics Data System (ADS)
Kasimov, Nurlybek; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-11-01
A novel volume penalization method to enforce immersed boundary conditions in Navier-Stokes and Euler equations is presented. Previously, Brinkman penalization has been used to introduce solid obstacles modeled as porous media, although it is limited to Dirichlet-type conditions on velocity and temperature. This method builds upon Brinkman penalization by allowing Neumann conditions to be applied in a general fashion. Correct boundary conditions are achieved through characteristic propagation into the thin layer inside of the obstacle. Inward pointing characteristics ensure nonphysical solution inside the obstacle does not propagate outside to the fluid. Dirichlet boundary conditions are enforced similarly to Brinkman method. Penalization parameters act on a much faster timescale than the characteristic timescale of the flow. Main advantage of the method is systematic means of the error control. This talk is focused on the progress that was made towards the extension of the method to the 3D flows around irregular shapes. This work was supported by ONR MURI on Soil Blast Modeling.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-06-01
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches. PMID:22376200
Numerical simulation of fluid-structure interaction with the volume penalization method
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Jörn
2015-01-01
We present a novel scheme for the numerical simulation of fluid-structure interaction problems. It extends the volume penalization method, a member of the family of immersed boundary methods, to take into account flexible obstacles. We show how the introduction of a smoothing layer, physically interpreted as surface roughness, allows for arbitrary motion of the deformable obstacle. The approach is carefully validated and good agreement with various results in the literature is found. A simple one-dimensional solid model is derived, capable of modeling arbitrarily large deformations and imposed motion at the leading edge, as it is required for the simulation of simplified models for insect flight. The model error is shown to be small, while the one-dimensional character of the model features a reasonably easy implementation. The coupled fluid-solid interaction solver is shown not to introduce artificial energy in the numerical coupling, and validated using a widely used benchmark. We conclude with the application of our method to models for insect flight and study the propulsive efficiency of one and two wing sections.
NASA Astrophysics Data System (ADS)
Tauriello, Gerardo; Koumoutsakos, Petros
2015-02-01
We present a comparative study of penalization and phase field methods for the solution of the diffusion equation in complex geometries embedded using simple Cartesian meshes. The two methods have been widely employed to solve partial differential equations in complex and moving geometries for applications ranging from solid and fluid mechanics to biology and geophysics. Their popularity is largely due to their discretization on Cartesian meshes thus avoiding the need to create body-fitted grids. At the same time, there are questions regarding their accuracy and it appears that the use of each one is confined by disciplinary boundaries. Here, we compare penalization and phase field methods to handle problems with Neumann and Robin boundary conditions. We discuss extensions for Dirichlet boundary conditions and in turn compare with methods that have been explicitly designed to handle Dirichlet boundary conditions. The accuracy of all methods is analyzed using one and two dimensional benchmark problems such as the flow induced by an oscillating wall and by a cylinder performing rotary oscillations. This comparative study provides information to decide which methods to consider for a given application and their incorporation in broader computational frameworks. We demonstrate that phase field methods are more accurate than penalization methods on problems with Neumann boundary conditions and we present an error analysis explaining this result.
REGRESSION METHODS FOR DATA WITH INCOMPLETE COVARIATES
Modern statistical methods in chronic disease epidemiology allow simultaneous regression of disease status on several covariates. hese methods permit examination of the effects of one covariate while controlling for those of others that may be causally related to the disease. owe...
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2010-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape optimization for drag reduction in flows past linked bodies. This optimization relies on Adaptive Wavelet Collocation Method along with the Brinkman penalization technique and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid, while a level set approach is used to describe the body shape and the Brinkman volume penalization allows for an easy variation of flow geometry without requiring body-fitted meshes. We perform 2D simulations of linked bodies in order to investigate whether flat geometries are optimal for drag reduction. In order to accelerate the costly cost function evaluations we exploit the inherent parallelism of ES and we extend the CMA-ES implementation to a multi-host framework. This framework allows for an easy distribution of the cost function evaluations across several parallel architectures and it is not limited to only one computing facility. The resulting optimal shapes are geometrically consistent with the shapes that have been obtained in the pioneering wind tunnel experiments for drag reduction using Evolution Strategies by Ingo Rechenberg.
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J.T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylor–Couette flow, the z-pinch configuration, three dimensional Orszag–Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylor–Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
NASA Astrophysics Data System (ADS)
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J. T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylo-Couette flow, the z-pinch configuration, three dimensional Orszag-Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylo-Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
TEMPERATURE SCENARIO DEVELOPMENT USING REGRESSION METHODS
A method of developing scenarios of future temperature conditions resulting from climatic change is presented. he method is straightforward and can be used to provide information about daily temperature variations and diurnal ranges, monthly average high, and low temperatures, an...
The Precision Efficacy Analysis for Regression Sample Size Method.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…
Shrinkage regression-based methods for microarray missing value imputation
2013-01-01
Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159
Calculation of Solar Radiation by Using Regression Methods
NASA Astrophysics Data System (ADS)
Kızıltan, Ö.; Şahin, M.
2016-04-01
In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.
Numerical study of impeller-driven von Kármán flows via a volume penalization method
NASA Astrophysics Data System (ADS)
Kreuzahler, S.; Schulz, D.; Homann, H.; Ponty, Y.; Grauer, R.
2014-10-01
Studying strongly turbulent flows is still a major challenge in fluid dynamics. It is highly desirable to have comparable experiments to obtain a better understanding of the mechanisms generating turbulence. The von Kármán flow apparatus is one of those experiments that has been used in various turbulence studies by different experimental groups over the last two decades. The von Kármán flow apparatus produces a highly turbulent flow inside a cylinder vessel driven by two counter-rotating impellers. The studies cover a broad range of physical systems including incompressible flows, especially water and air, magnetohydrodynamic systems using liquid metal for understanding the important topic of the dynamo instability, particle tracking to study Lagrangian type turbulence and recently quantum turbulence in super-fluid helium. Therefore, accompanying numerical studies of the von Kármán flow that compare quantitatively data with those from experiments are of high importance for understanding the mechanism producing the characteristic flow patterns. We present a direct numerical simulation (DNS) version the von Kármán flow, forced by two rotating impellers. The cylinder geometry and the rotating objects are modelled via a penalization method and implemented in a massive parallel pseudo-spectral Navier-Stokes solver. From the wide range of different impellers used in von Kármán water and sodium experiments we choose a special configuration (TM28), in order to compare our simulations with the according set of well documented water experiments. Though this configuration is different from the one in the final VKS experiment (TM73), using our method it is quite easy to change the impeller shape to the one actually used in VKS. The decomposition into poloidal and toroidal components and the mean velocity field from our simulations are in good agreement with experimental results. In addition, we analysed the flow structure close to the impeller blades, a region
NASA Astrophysics Data System (ADS)
Chatelin, Robin; Poncet, Philippe
2014-07-01
Particle methods are very convenient to compute transport equations in fluid mechanics as their computational cost is linear and they are not limited by convection stability conditions. To achieve large 3D computations the method must be coupled to efficient algorithms for velocity computations, including a good treatment of non-homogeneities and complex moving geometries. The Penalization method enables to consider moving bodies interaction by adding a term in the conservation of momentum equation. This work introduces a new computational algorithm to solve implicitly in the same step the Penalization term and the Laplace operators, since explicit computations are limited by stability issues, especially at low Reynolds number. This computational algorithm is based on the Sherman-Morrison-Woodbury formula coupled to a GMRES iterative method to reduce the computations to a sequence of Poisson problems: this allows to formulate a penalized Poisson equation as a large perturbation of a standard Poisson, by means of algebraic relations. A direct consequence is the possibility to use fast solvers based on Fast Fourier Transforms for this problem with good efficiency from both the computational and the memory consumption point of views, since these solvers are recursive and they do not perform any matrix assembling. The resulting fluid mechanics computations are very fast and they consume a small amount of memory, compared to a reference solver or a linear system resolution. The present applications focus mainly on a coupling between transport equation and 3D Stokes equations, for studying biological organisms motion in a highly viscous flows with variable viscosity.
Cheng, Lishui; Hobbs, Robert F.; Sgouros, George; Frey, Eric C.
2014-01-01
Purpose: Three-dimensional (3D) dosimetry has the potential to provide better prediction of response of normal tissues and tumors and is based on 3D estimates of the activity distribution in the patient obtained from emission tomography. Dose–volume histograms (DVHs) are an important summary measure of 3D dosimetry and a widely used tool for treatment planning in radiation therapy. Accurate estimates of the radioactivity distribution in space and time are desirable for accurate 3D dosimetry. The purpose of this work was to develop and demonstrate the potential of penalized SPECT image reconstruction methods to improve DVHs estimates obtained from 3D dosimetry methods. Methods: The authors developed penalized image reconstruction methods, using maximum a posteriori (MAP) formalism, which intrinsically incorporate regularization in order to control noise and, unlike linear filters, are designed to retain sharp edges. Two priors were studied: one is a 3D hyperbolic prior, termed single-time MAP (STMAP), and the second is a 4D hyperbolic prior, termed cross-time MAP (CTMAP), using both the spatial and temporal information to control noise. The CTMAP method assumed perfect registration between the estimated activity distributions and projection datasets from the different time points. Accelerated and convergent algorithms were derived and implemented. A modified NURBS-based cardiac-torso phantom with a multicompartment kidney model and organ activities and parameters derived from clinical studies were used in a Monte Carlo simulation study to evaluate the methods. Cumulative dose-rate volume histograms (CDRVHs) and cumulative DVHs (CDVHs) obtained from the phantom and from SPECT images reconstructed with both the penalized algorithms and OS-EM were calculated and compared both qualitatively and quantitatively. The STMAP method was applied to patient data and CDRVHs obtained with STMAP and OS-EM were compared qualitatively. Results: The results showed that the
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Conventional occlusion versus pharmacologic penalization for amblyopia
Li, Tianjing; Shotton, Kate
2013-01-01
Background Amblyopia is defined as defective visual acuity in one or both eyes without demonstrable abnormality of the visual pathway, and is not immediately resolved by wearing glasses. Objectives To assess the effectiveness and safety of conventional occlusion versus atropine penalization for amblyopia. Search methods We searched CENTRAL, MEDLINE, EMBASE, LILACS, the WHO International Clinical Trials Registry Platform, preference lists, science citation index and ongoing trials up to June 2009. Selection criteria We included randomized/quasi-randomized controlled trials comparing conventional occlusion to atropine penalization for amblyopia. Data collection and analysis Two authors independently screened abstracts and full text articles, abstracted data, and assessed the risk of bias. Main results Three trials with a total of 525 amblyopic eyes were included. One trial was assessed as having a low risk of bias among these three trials, and one was assessed as having a high risk of bias. Evidence from three trials suggests atropine penalization is as effective as conventional occlusion. One trial found similar improvement in vision at six and 24 months. At six months, visual acuity in the amblyopic eye improved from baseline 3.16 lines in the occlusion and 2.84 lines in the atropine group (mean difference 0.034 logMAR; 95% confidence interval (CI) 0.005 to 0.064 logMAR). At 24 months, additional improvement was seen in both groups; but there continued to be no meaningful difference (mean difference 0.01 logMAR; 95% CI −0.02 to 0.04 logMAR). The second trial reported atropine to be more effective than occlusion. At six months, visual acuity improved 1.8 lines in the patching group and 3.4 lines in the atropine penalization group, and was in favor of atropine (mean difference −0.16 logMAR; 95% CI −0.23 to −0.09 logMAR). Different occlusion modalities were used in these two trials. The third trial had inherent methodological flaws and limited inference could
Sparse Multivariate Regression With Covariance Estimation
Rothman, Adam J.; Levina, Elizaveta; Zhu, Ji
2014-01-01
We propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. This method, which we call multivariate regression with covariance estimation (MRCE), involves penalized likelihood with simultaneous estimation of the regression coefficients and the covariance structure. An efficient optimization algorithm and a fast approximation are developed for computing MRCE. Using simulation studies, we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns. An R-package containing this dataset and code for computing MRCE and its approximation are available online. PMID:24963268
Multiobjective optimization for model selection in kernel methods in regression.
You, Di; Benitez-Quiroz, Carlos Fabian; Martinez, Aleix M
2014-10-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-versus-variance tradeoff. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a tradeoff between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition, and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared with methods in the state of the art. PMID:25291740
Multiobjective Optimization for Model Selection in Kernel Methods in Regression
You, Di; Benitez-Quiroz, C. Fabian; Martinez, Aleix M.
2016-01-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-vs-variance trade-off. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a trade-off between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared to methods in the state of the art. PMID:25291740
The extinction law from photometric data: linear regression methods
NASA Astrophysics Data System (ADS)
Ascenso, J.; Lombardi, M.; Lada, C. J.; Alves, J.
2012-04-01
Context. The properties of dust grains, in particular their size distribution, are expected to differ from the interstellar medium to the high-density regions within molecular clouds. Since the extinction at near-infrared wavelengths is caused by dust, the extinction law in cores should depart from that found in low-density environments if the dust grains have different properties. Aims: We explore methods to measure the near-infrared extinction law produced by dense material in molecular cloud cores from photometric data. Methods: Using controlled sets of synthetic and semi-synthetic data, we test several methods for linear regression applied to the specific problem of deriving the extinction law from photometric data. We cover the parameter space appropriate to this type of observations. Results: We find that many of the common linear-regression methods produce biased results when applied to the extinction law from photometric colors. We propose and validate a new method, LinES, as the most reliable for this effect. We explore the use of this method to detect whether or not the extinction law of a given reddened population has a break at some value of extinction. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (ESO programmes 069.C-0426 and 074.C-0728).
Cathodic protection design using the regression and correlation method
Niembro, A.M.; Ortiz, E.L.G.
1997-09-01
A computerized statistical method which calculates the current demand requirement based on potential measurements for cathodic protection systems is introduced. The method uses the regression and correlation analysis of statistical measurements of current and potentials of the piping network. This approach involves four steps: field potential measurements, statistical determination of the current required to achieve full protection, installation of more cathodic protection capacity with distributed anodes around the plant and examination of the protection potentials. The procedure is described and recommendations for the improvement of the existing and new cathodic protection systems are given.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
A locally adaptive kernel regression method for facies delineation
NASA Astrophysics Data System (ADS)
Fernàndez-Garcia, D.; Barahona-Palomo, M.; Henri, C. V.; Sanchez-Vila, X.
2015-12-01
Facies delineation is defined as the separation of geological units with distinct intrinsic characteristics (grain size, hydraulic conductivity, mineralogical composition). A major challenge in this area stems from the fact that only a few scattered pieces of hydrogeological information are available to delineate geological facies. Several methods to delineate facies are available in the literature, ranging from those based only on existing hard data, to those including secondary data or external knowledge about sedimentological patterns. This paper describes a methodology to use kernel regression methods as an effective tool for facies delineation. The method uses both the spatial and the actual sampled values to produce, for each individual hard data point, a locally adaptive steering kernel function, self-adjusting the principal directions of the local anisotropic kernels to the direction of highest local spatial correlation. The method is shown to outperform the nearest neighbor classification method in a number of synthetic aquifers whenever the available number of hard data is small and randomly distributed in space. In the case of exhaustive sampling, the steering kernel regression method converges to the true solution. Simulations ran in a suite of synthetic examples are used to explore the selection of kernel parameters in typical field settings. It is shown that, in practice, a rule of thumb can be used to obtain suboptimal results. The performance of the method is demonstrated to significantly improve when external information regarding facies proportions is incorporated. Remarkably, the method allows for a reasonable reconstruction of the facies connectivity patterns, shown in terms of breakthrough curves performance.
Robust Logistic and Probit Methods for Binary and Multinomial Regression
Tabatabai, MA; Li, H; Eby, WM; Kengwoung-Keumo, JJ; Manne, U; Bae, S; Fouad, M; Singh, KP
2015-01-01
In this paper we introduce new robust estimators for the logistic and probit regressions for binary, multinomial, nominal and ordinal data and apply these models to estimate the parameters when outliers or inluential observations are present. Maximum likelihood estimates don't behave well when outliers or inluential observations are present. One remedy is to remove inluential observations from the data and then apply the maximum likelihood technique on the deleted data. Another approach is to employ a robust technique that can handle outliers and inluential observations without removing any observations from the data sets. The robustness of the method is tested using real and simulated data sets. PMID:26078914
Analyzing Big Data with the Hybrid Interval Regression Methods
Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968
Analyzing big data with the hybrid interval regression methods.
Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968
L1-Penalized N-way PLS for subset of electrodes selection in BCI experiments
NASA Astrophysics Data System (ADS)
Eliseyev, Andrey; Moro, Cecile; Faber, Jean; Wyss, Alexander; Torres, Napoleon; Mestais, Corinne; Benabid, Alim Louis; Aksenova, Tetiana
2012-08-01
Recently, the N-way partial least squares (NPLS) approach was reported as an effective tool for neuronal signal decoding and brain-computer interface (BCI) system calibration. This method simultaneously analyzes data in several domains. It combines the projection of a data tensor to a low dimensional space with linear regression. In this paper the L1-Penalized NPLS is proposed for sparse BCI system calibration, allowing uniting the projection technique with an effective selection of subset of features. The L1-Penalized NPLS was applied for the binary self-paced BCI system calibration, providing selection of electrodes subset. Our BCI system is designed for animal research, in particular for research in non-human primates.
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Kwak, Il-Youp; Moore, Candace R; Spalding, Edgar P; Broman, Karl W
2014-08-01
Most statistical methods for quantitative trait loci (QTL) mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While methods exist for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl. PMID:24931408
Bayes and empirical Bayes methods for reduced rank regression models in matched case-control studies
Zhou, Qin; Lan, Qing; Rothman, Nathaniel; Langseth, Hilde; Engel, Lawrence S.
2015-01-01
Summary Matched case-control studies are popular designs used in epidemiology for assessing the effects of exposures on binary traits. Modern studies increasingly enjoy the ability to examine a large number of exposures in a comprehensive manner. However, several risk factors often tend to be related in a non-trivial way, undermining efforts to identify the risk factors using standard analytic methods due to inflated type I errors and possible masking of effects. Epidemiologists often use data reduction techniques by grouping the prognostic factors using a thematic approach, with themes deriving from biological considerations. We propose shrinkage type estimators based on Bayesian penalization methods to estimate the effects of the risk factors using these themes. The properties of the estimators are examined using extensive simulations. The methodology is illustrated using data from a matched case-control study of polychlorinflated biphenyls in relation to the etiology of non-Hodgkin’s lymphoma. PMID:26575519
Satagopan, Jaya M; Sen, Ananda; Zhou, Qin; Lan, Qing; Rothman, Nathaniel; Langseth, Hilde; Engel, Lawrence S
2016-06-01
Matched case-control studies are popular designs used in epidemiology for assessing the effects of exposures on binary traits. Modern studies increasingly enjoy the ability to examine a large number of exposures in a comprehensive manner. However, several risk factors often tend to be related in a nontrivial way, undermining efforts to identify the risk factors using standard analytic methods due to inflated type-I errors and possible masking of effects. Epidemiologists often use data reduction techniques by grouping the prognostic factors using a thematic approach, with themes deriving from biological considerations. We propose shrinkage-type estimators based on Bayesian penalization methods to estimate the effects of the risk factors using these themes. The properties of the estimators are examined using extensive simulations. The methodology is illustrated using data from a matched case-control study of polychlorinated biphenyls in relation to the etiology of non-Hodgkin's lymphoma. PMID:26575519
An Investigation of the Median-Median Method of Linear Regression
ERIC Educational Resources Information Center
Walters, Elizabeth J.; Morrell, Christopher H.; Auer, Richard E.
2006-01-01
Least squares regression is the most common method of fitting a straight line to a set of bivariate data. Another less known method that is available on Texas Instruments graphing calculators is median-median regression. This method is proposed as a simple method that may be used with middle and high school students to motivate the idea of fitting…
Weighted Structural Regression: A Broad Class of Adaptive Methods for Improving Linear Prediction.
ERIC Educational Resources Information Center
Pruzek, Robert M.; Lepak, Greg M.
1992-01-01
Adaptive forms of weighted structural regression are developed and discussed. Bootstrapping studies indicate that the new methods have potential to recover known population regression weights and predict criterion score values routinely better than do ordinary least squares methods. The new methods are scale free and simple to compute. (SLD)
NASA Astrophysics Data System (ADS)
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Evaluation of preservative systems in a sunscreen formula by linear regression method.
Bou-Chacra, Nádia A; Pinto, Terezinha de Jesus A; Ohara, Mitsuko Taba
2003-01-01
A sunscreen formula with eight different preservative systems was evaluated by linear regression, pharmacopeial, and the CTFA (Cosmetic, Toiletry and Fragrance Association) methods. The preparations were tested against Staphylococcus aureus, Burkholderia cepacia, Shewanella putrefaciens, Escherichia coli, and Bacillus sp. The linear regression method proved to be useful in the selection of the most effective preservative system used in cosmetic formulation. PMID:12688287
A new method for dealing with measurement error in explanatory variables of regression models.
Freedman, Laurence S; Fainberg, Vitaly; Kipnis, Victor; Midthune, Douglas; Carroll, Raymond J
2004-03-01
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables. PMID:15032787
Risk prediction with machine learning and regression methods.
Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben
2014-07-01
This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. PMID:24615859
Gaussian Process Regression Plus Method for Localization Reliability Improvement.
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-01-01
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions. PMID:27483276
[Criminalistic and penal problems with "dyadic deaths"].
Kaliszczak, Paweł; Kunz, Jerzy; Bolechała, Filip
2002-01-01
This paper is a supplement to the article "Medico legal problems of dyadic death" elaborated by the same authors. Recalling the cases presented there. It is also an attempt to present the basic criminalistic, penal and definitional problems of dyadic death called also postagressional suicide. Criminalistic problems of dyadic death were presented in view of widely known "rule of seven golden questions"--what?, where?, when?, how?, why?, what method? and who? Criminalistic analysis of cases makes some differences in conclusions but it seemed interesting to match both--criminalistc and forensic points of views to the presented material. PMID:14669688
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. PMID:23379851
Interquantile Shrinkage and Variable Selection in Quantile Regression
Jiang, Liewen; Bondell, Howard D.; Wang, Huixia Judy
2014-01-01
Examination of multiple conditional quantile functions provides a comprehensive view of the relationship between the response and covariates. In situations where quantile slope coefficients share some common features, estimation efficiency and model interpretability can be improved by utilizing such commonality across quantiles. Furthermore, elimination of irrelevant predictors will also aid in estimation and interpretation. These motivations lead to the development of two penalization methods, which can identify the interquantile commonality and nonzero quantile coefficients simultaneously. The developed methods are based on a fused penalty that encourages sparsity of both quantile coefficients and interquantile slope differences. The oracle properties of the proposed penalization methods are established. Through numerical investigations, it is demonstrated that the proposed methods lead to simpler model structure and higher estimation efficiency than the traditional quantile regression estimation. PMID:24653545
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Regression calibration method for correcting measurement-error bias in nutritional epidemiology.
Spiegelman, D; McDermott, A; Rosner, B
1997-04-01
Regression calibration is a statistical method for adjusting point and interval estimates of effect obtained from regression models commonly used in epidemiology for bias due to measurement error in assessing nutrients or other variables. Previous work developed regression calibration for use in estimating odds ratios from logistic regression. We extend this here to estimating incidence rate ratios from Cox proportional hazards models and regression slopes from linear-regression models. Regression calibration is appropriate when a gold standard is available in a validation study and a linear measurement error with constant variance applies or when replicate measurements are available in a reliability study and linear random within-person error can be assumed. In this paper, the method is illustrated by correction of rate ratios describing the relations between the incidence of breast cancer and dietary intakes of vitamin A, alcohol, and total energy in the Nurses' Health Study. An example using linear regression is based on estimation of the relation between ultradistal radius bone density and dietary intakes of caffeine, calcium, and total energy in the Massachusetts Women's Health Study. Software implementing these methods uses SAS macros. PMID:9094918
Cox Regression Models with Functional Covariates for Survival Data
Gellar, Jonathan E.; Colantuoni, Elizabeth; Needham, Dale M.; Crainiceanu, Ciprian M.
2015-01-01
We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome. PMID:26441487
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
ERIC Educational Resources Information Center
Blair, R. Clifford; Higgins, J.J.
1978-01-01
The controversy surrounding regression methods for unbalanced factorial designs is addressed. The statistical hypotheses being tested under the various methods, as well as salient issues in the use of these methods, are discussed. The use of statistical computer packages is also discussed. (Author/JKS)
Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.
1992-01-01
Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
ERIC Educational Resources Information Center
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
A comparison of several methods of solving nonlinear regression groundwater flow problems.
Cooley, R.L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems.-from Author
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Ksantini, Riadh; Ziou, Djemel; Colin, Bernard; Dubeau, François
2008-02-01
In this paper, we investigate the effectiveness of a Bayesian logistic regression model to compute the weights of a pseudo-metric, in order to improve its discriminatory capacity and thereby increase image retrieval accuracy. In the proposed Bayesian model, the prior knowledge of the observations is incorporated and the posterior distribution is approximated by a tractable Gaussian form using variational transformation and Jensen's inequality, which allow a fast and straightforward computation of the weights. The pseudo-metric makes use of the compressed and quantized versions of wavelet decomposed feature vectors, and in our previous work, the weights were adjusted by classical logistic regression model. A comparative evaluation of the Bayesian and classical logistic regression models is performed for content-based image retrieval as well as for other classification tasks, in a decontextualized evaluation framework. In this same framework, we compare the Bayesian logistic regression model to some relevant state-of-the-art classification algorithms. Experimental results show that the Bayesian logistic regression model outperforms these linear classification algorithms, and is a significantly better tool than the classical logistic regression model to compute the pseudo-metric weights and improve retrieval and classification performance. Finally, we perform a comparison with results obtained by other retrieval methods. PMID:18084057
GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
Zheng, Qi; Peng, Limin; He, Xuming
2015-01-01
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Factor Regression Analysis: A New Method for Weighting Predictors. Final Report.
ERIC Educational Resources Information Center
Curtis, Ervin W.
The optimum weighting of variables to predict a dependent-criterion variable is an important problem in nearly all of the social and natural sciences. Although the predominant method, multiple regression analysis (MR), yields optimum weights for the sample at hand, these weights are not generally optimum in the population from which the sample was…
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model. PMID:15490436
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Estimation of anthropogenic heat emission over South Korea using a statistical regression method
NASA Astrophysics Data System (ADS)
Lee, Sang-Hyun; Kim, Soon-Tae
2015-05-01
Anthropogenic heating by human activity is one of the key contributing factors in forming urban heat islands, thus inclusion of the heat source plays an important role in urban meteorological and environmental modeling. In this study, gridded anthropogenic heat flux (AHF) with high spatial (1-km) and temporal (1-hr) resolution is estimated for the whole South Korea region in year 2010 using a statistical regression method which derives based on similarity of anthropogenic air pollutant emissions and AHF in their emission inventories. The bottom-up anthropogenic pollutant emissions required for the regression method are produced using the intensive Korean air pollutants emission inventories. The calculated regression-based AHF compares well with the inventory-based AHF estimation for the Gyeong-In region, demonstrating that the statistical regression method can reasonably represent spatio-temporal variation of the AHF within the region. The estimated AHF shows that for major Korean cities (Seoul, Busan, Daegu, Gwangju, Daejeon, and Ulsan) the annual mean AHF range 10-50 Wm-2 on a grid scale and 20-30W m-2 on a city-scale. The winter AHF are larger by about 22% than that in summer, while the weekday AHF are larger by 4-5% than the weekend AHF in the major Korean cities. The gridded AHF data estimated in this study can be used in mesoscale meteorological and environmental modeling for the South Korea region.
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Penalized Q-Learning for Dynamic Treatment Regimens
Song, R.; Wang, W.; Zeng, D.; Kosorok, M. R.
2014-01-01
A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. It is illustrated with a depression clinical trial study. PMID:26257504
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234
NASA Astrophysics Data System (ADS)
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.
Race Making in a Penal Institution.
Walker, Michael L
2016-01-01
This article provides a ground-level investigation into the lives of penal inmates, linking the literature on race making and penal management to provide an understanding of racial formation processes in a modern penal institution. Drawing on 135 days of ethnographic data collected as an inmate in a Southern California county jail system, the author argues that inmates are subjected to two mutually constitutive racial projects--one institutional and the other microinteractional. Operating in symbiosis within a narrative of risk management, these racial projects increase (rather than decrease) incidents of intraracial violence and the potential for interracial violence. These findings have implications for understanding the process of racialization and evaluating the effectiveness of penal management strategies. PMID:27017706
Zhang, Li-qing; Wu, Xiao-hua; Tang, Xi; Zhu, Xian-liang; Su, Wen-ting
2002-06-01
Principal component regression (PCR) method is used to analyse five components: acetaminophen, p-aminophenol, caffeine, chlorphenamine maleate and guaifenesin. The basic principle and the analytical step of the approach are described in detail. The computer program of LHG is based on VB language. The experimental result shows that the PCR method has no systematical error as compared to classical method. The experimental result shows that the average recovery of each component is all in the range from 96.43% to 107.14%. Each component obtains satisfactory result without any pre-separation. The approach is simple, rapid and suitable for the computer-aid analysis. PMID:12938324
Adaptive wavelet simulation of global ocean dynamics using a new Brinkman volume penalization
NASA Astrophysics Data System (ADS)
Kevlahan, N. K.-R.; Dubos, T.; Aechtner, M.
2015-12-01
In order to easily enforce solid-wall boundary conditions in the presence of complex coastlines, we propose a new mass and energy conserving Brinkman penalization for the rotating shallow water equations. This penalization does not lead to higher wave speeds in the solid region. The error estimates for the penalization are derived analytically and verified numerically for linearized one-dimensional equations. The penalization is implemented in a conservative dynamically adaptive wavelet method for the rotating shallow water equations on the sphere with bathymetry and coastline data from NOAA's ETOPO1 database. This code could form the dynamical core for a future global ocean model. The potential of the dynamically adaptive ocean model is illustrated by using it to simulate the 2004 Indonesian tsunami and wind-driven gyres.
Makra, László; Matyasovszky, István; Thibaudon, Michel; Bonini, Maira
2011-05-01
Nonparametric time-varying regression methods were developed to forecast daily ragweed pollen concentration, and the probability of the exceedance of a given concentration threshold 1 day ahead. Five-day and 10-day predictions of the start and end of the pollen season were also addressed with a nonparametric regression technique combining regression analysis with the method of temperature sum. Our methods were applied to three of the most polluted regions in Europe, namely Lyon (Rhône Valley, France), Legnano (Po River Plain, Italy) and Szeged (Great Plain, Hungary). For a 1-day prediction of both the daily pollen concentration and daily threshold exceedance, the order of these cities from the smallest to largest prediction errors was Legnano, Lyon, Szeged and Legnano, Szeged, Lyon, respectively. The most important predictor for each location was the pollen concentration of previous days. The second main predictor was precipitation for Lyon, and temperature for Legnano and Szeged. Wind speed should be considered for daily concentration at Legnano, and for daily pollen threshold exceedances at Lyon and Szeged. Prediction capabilities compared to the annual cycles for the start and end of the pollen season decreased from west to east. The order of the cities from the lowest to largest errors for the end of the pollen season was Lyon, Legnano, Szeged for both the 5- and 10-day predictions, while for the start of the pollen season the order was Legnano, Lyon, Szeged for 5-day predictions, and Legnano, Szeged, Lyon for 10-day predictions. PMID:20625911
Adluru, Nagesh; Hanlon, Bret M; Lutz, Antoine; Lainhart, Janet E; Alexander, Andrew L; Davidson, Richard J
2013-04-01
Neuroimage phenotyping for psychiatric and neurological disorders is performed using voxelwise analyses also known as voxel based analyses or morphometry (VBM). A typical voxelwise analysis treats measurements at each voxel (e.g., fractional anisotropy, gray matter probability) as outcome measures to study the effects of possible explanatory variables (e.g., age, group) in a linear regression setting. Furthermore, each voxel is treated independently until the stage of correction for multiple comparisons. Recently, multi-voxel pattern analyses (MVPA), such as classification, have arisen as an alternative to VBM. The main advantage of MVPA over VBM is that the former employ multivariate methods which can account for interactions among voxels in identifying significant patterns. They also provide ways for computer-aided diagnosis and prognosis at individual subject level. However, compared to VBM, the results of MVPA are often more difficult to interpret and prone to arbitrary conclusions. In this paper, first we use penalized likelihood modeling to provide a unified framework for understanding both VBM and MVPA. We then utilize statistical learning theory to provide practical methods for interpreting the results of MVPA beyond commonly used performance metrics, such as leave-one-out-cross validation accuracy and area under the receiver operating characteristic (ROC) curve. Additionally, we demonstrate that there are challenges in MVPA when trying to obtain image phenotyping information in the form of statistical parametric maps (SPMs), which are commonly obtained from VBM, and provide a bootstrap strategy as a potential solution for generating SPMs using MVPA. This technique also allows us to maximize the use of available training data. We illustrate the empirical performance of the proposed framework using two different neuroimaging studies that pose different levels of challenge for classification using MVPA. PMID:23397550
Impact of regression methods on improved effects of soil structure on soil water retention estimates
NASA Astrophysics Data System (ADS)
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
Flexible regression models over river networks
O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark
2014-01-01
Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460
An Efficient Simulation Budget Allocation Method Incorporating Regression for Partitioned Domains*
Brantley, Mark W.; Lee, Loo Hay; Chen, Chun-Hung; Xu, Jie
2014-01-01
Simulation can be a very powerful tool to help decision making in many applications but exploring multiple courses of actions can be time consuming. Numerous ranking & selection (R&S) procedures have been developed to enhance the simulation efficiency of finding the best design. To further improve efficiency, one approach is to incorporate information from across the domain into a regression equation. However, the use of a regression metamodel also inherits some typical assumptions from most regression approaches, such as the assumption of an underlying quadratic function and the simulation noise is homogeneous across the domain of interest. To extend the limitation while retaining the efficiency benefit, we propose to partition the domain of interest such that in each partition the mean of the underlying function is approximately quadratic. Our new method provides approximately optimal rules for between and within partitions that determine the number of samples allocated to each design location. The goal is to maximize the probability of correctly selecting the best design. Numerical experiments demonstrate that our new approach can dramatically enhance efficiency over existing efficient R&S methods. PMID:24936099
Li, Min; Zhou, Tong; Song, Yanan
2016-07-01
A grain size characterization method based on energy attenuation coefficient spectrum and support vector regression (SVR) is proposed. First, the spectra of the first and second back-wall echoes are cut into several frequency bands to calculate the energy attenuation coefficient spectrum. Second, the frequency band that is sensitive to grain size variation is determined. Finally, a statistical model between the energy attenuation coefficient in the sensitive frequency band and average grain size is established through SVR. Experimental verification is conducted on austenitic stainless steel. The average relative error of the predicted grain size is 5.65%, which is better than that of conventional methods. PMID:26995732
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Mortality Prediction in ICUs Using A Novel Time-Slicing Cox Regression Method
Wang, Yuan; Chen, Wenlin; Heard, Kevin; Kollef, Marin H.; Bailey, Thomas C.; Cui, Zhicheng; He, Yujie; Lu, Chenyang; Chen, Yixin
2015-01-01
Over the last few decades, machine learning and data mining have been increasingly used for clinical prediction in ICUs. However, there is still a huge gap in making full use of the time-series data generated from ICUs. Aiming at filling this gap, we propose a novel approach entitled Time Slicing Cox regression (TS-Cox), which extends the classical Cox regression into a classification method on multi-dimensional time-series. Unlike traditional classifiers such as logistic regression and support vector machines, our model not only incorporates the discriminative features derived from the time-series, but also naturally exploits the temporal orders of these features based on a Cox-like function. Empirical evaluation on MIMIC-II database demonstrates the efficacy of the TS-Cox model. Our TS-Cox model outperforms all other baseline models by a good margin in terms of AUC_PR, sensitivity and PPV, which indicates that TS-Cox may be a promising tool for mortality prediction in ICUs. PMID:26958269
A deformation analysis method of stepwise regression for bridge deflection prediction
NASA Astrophysics Data System (ADS)
Shen, Yueqian; Zeng, Ying; Zhu, Lei; Huang, Teng
2015-12-01
Large-scale bridges are among the most important infrastructures whose safe conditions concern people's daily activities and life safety. Monitoring of large-scale bridges is crucial since deformation might have occurred. How to obtain the deformation information and then judge the safe conditions are the key and difficult problems in bridge deformation monitoring field. Deflection is the important index for evaluation of bridge safety. This paper proposes a forecasting modeling of stepwise regression analysis. Based on the deflection monitoring data of Yangtze River Bridge, the main factors influenced deflection deformation is chiefly studied. Authors use the monitoring data to forecast the deformation value of a bridge deflection at different time from the perspective of non-bridge structure, and compared to the forecasting of gray relational analysis based on linear regression. The result show that the accuracy and reliability of stepwise regression analysis is high, which provides the scientific basis to the bridge operation management. And above all, the ideas of this research provide and effective method for bridge deformation analysis.
NASA Astrophysics Data System (ADS)
Jiang, Wei-Zhong; Su, Ning; Ding, Li-Ping; Qiu, Si-Yu
This paper presents a new method to estimate the system harmonic impedance and the harmonic emission level based on the improved weighted support vector machine (WSVM) regression. According to the differences of harmonic measurement data at the point of common coupling, the WSVM can be obtained by correcting the error requirement of SVM by Euclidean distance as a weighted index and determining the weighted coefficient of penalty parameter by linear interpolation, then the system harmonic impedance and the harmonic emission level can be calculated. Based on analyzing the simulation of the circuit and the practical application of field data, it proves that the proposed method can effectively restrain the influence caused by the fluctuation of background harmonic on estimation results. Compared with other methods, the estimate result of the proposed method is more reasonable.
Statistical method for prediction of gait kinematics with Gaussian process regression.
Yun, Youngmok; Kim, Hyun-Chul; Shin, Sung Yul; Lee, Junwon; Deshpande, Ashish D; Kim, Changhwan
2014-01-01
We propose a novel methodology for predicting human gait pattern kinematics based on a statistical and stochastic approach using a method called Gaussian process regression (GPR). We selected 14 body parameters that significantly affect the gait pattern and 14 joint motions that represent gait kinematics. The body parameter and gait kinematics data were recorded from 113 subjects by anthropometric measurements and a motion capture system. We generated a regression model with GPR for gait pattern prediction and built a stochastic function mapping from body parameters to gait kinematics based on the database and GPR, and validated the model with a cross validation method. The function can not only produce trajectories for the joint motions associated with gait kinematics, but can also estimate the associated uncertainties. Our approach results in a novel, low-cost and subject-specific method for predicting gait kinematics with only the subject's body parameters as the necessary input, and also enables a comprehensive understanding of the correlation and uncertainty between body parameters and gait kinematics. PMID:24211221
Least squares regression methods for clustered ROC data with discrete covariates.
Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton
2016-07-01
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods. PMID:26848938
A least trimmed square regression method for second level FMRI effective connectivity analysis.
Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas Martin
2013-01-01
We present a least trimmed square (LTS) robust regression method to combine different runs/subjects for second/high level effective connectivity analysis. The basic idea of this method is to treat the extreme nonlinear model variability as outliers if they exceed a certain threshold. A bootstrap method for the LTS estimation is employed to detect model outliers. We compared the LTS robust method with a non-robust method using simulated and real datasets. The difference between LTS and the non-robust method for second level effective connectivity analysis is significant, suggesting the conventional non-robust method is easily affected by the model variability from the first level analysis. In addition, after these outliers are detected and excluded for the high level analysis, the model coefficients of the second level are combined within the framework of a mixed model. The variance of the mixed model is estimated using the Newton-Raphson (NR) type Levenberg-Marquardt algorithm. Three sets of real data are adopted to compare conventional methods which do not include random effects in the analysis with a mixed model for second level effective connectivity analysis. The results show that the conventional method is significantly different from the mixed model when greater model variability exists, suggesting there is a strong random effect, and the mixed model should be employed for the second level effective connectivity analysis. PMID:23093379
A fast nonlinear regression method for estimating permeability in CT perfusion imaging
Bennink, Edwin; Riordan, Alan J; Horsch, Alexander D; Dankbaar, Jan Willem; Velthuis, Birgitta K; de Jong, Hugo W
2013-01-01
Blood–brain barrier damage, which can be quantified by measuring vascular permeability, is a potential predictor for hemorrhagic transformation in acute ischemic stroke. Permeability is commonly estimated by applying Patlak analysis to computed tomography (CT) perfusion data, but this method lacks precision. Applying more elaborate kinetic models by means of nonlinear regression (NLR) may improve precision, but is more time consuming and therefore less appropriate in an acute stroke setting. We propose a simplified NLR method that may be faster and still precise enough for clinical use. The aim of this study is to evaluate the reliability of in total 12 variations of Patlak analysis and NLR methods, including the simplified NLR method. Confidence intervals for the permeability estimates were evaluated using simulated CT attenuation–time curves with realistic noise, and clinical data from 20 patients. Although fixating the blood volume improved Patlak analysis, the NLR methods yielded significantly more reliable estimates, but took up to 12 × longer to calculate. The simplified NLR method was ∼4 × faster than other NLR methods, while maintaining the same confidence intervals (CIs). In conclusion, the simplified NLR method is a new, reliable way to estimate permeability in stroke, fast enough for clinical application in an acute stroke setting. PMID:23881247
Shi, Yinghuan; Gao, Yaozong; Liao, Shu; Zhang, Daoqiang
2015-01-01
In1 recent years, there has been a great interest in prostate segmentation, which is a important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician’s simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms: tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice. PMID:26752809
Multi temporal regression method for mid infrared [3-5microm] emissivity outdoor.
Nerry, Françoise; Malaplate, Alain; Stoll, Marc
2004-12-27
An experimental study to address issues encountered in the determination of surface bi-directional reflectivity and emissivity of materials [3-5microm] region has been conducted in outdoors conditions. The measurement protocol included radiometric infrared camera acquisitions in both [3-5microm] (band-2) and [8-14microm] (band-3). The band-2 bi-directional reflectivity is obtained from a sequence of sunlit and shade measurements. Best results are found with measurements relative to a diffuse aluminum reflector. Direct inversion of band-2 radiometric signal is unstable. A multitemporal method is introduced and the slope of the linear regression is the searched emisssivity. A detailed analysis is conducted to assess the impact of different sources of systematic errors. The proposed method is found to have a good potential with an estimated measurement error in the range of 2%. PMID:19488308
A non-linear regression method for CT brain perfusion analysis
NASA Astrophysics Data System (ADS)
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.
Gupta, N
2008-04-22
3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO{sub 2} density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Benedetti, Andrea; Platt, Robert; Atherton, Juli
2014-01-01
Background Over time, adaptive Gaussian Hermite quadrature (QUAD) has become the preferred method for estimating generalized linear mixed models with binary outcomes. However, penalized quasi-likelihood (PQL) is still used frequently. In this work, we systematically evaluated whether matching results from PQL and QUAD indicate less bias in estimated regression coefficients and variance parameters via simulation. Methods We performed a simulation study in which we varied the size of the data set, probability of the outcome, variance of the random effect, number of clusters and number of subjects per cluster, etc. We estimated bias in the regression coefficients, odds ratios and variance parameters as estimated via PQL and QUAD. We ascertained if similarity of estimated regression coefficients, odds ratios and variance parameters predicted less bias. Results Overall, we found that the absolute percent bias of the odds ratio estimated via PQL or QUAD increased as the PQL- and QUAD-estimated odds ratios became more discrepant, though results varied markedly depending on the characteristics of the dataset Conclusions Given how markedly results varied depending on data set characteristics, specifying a rule above which indicated biased results proved impossible. This work suggests that comparing results from generalized linear mixed models estimated via PQL and QUAD is a worthwhile exercise for regression coefficients and variance components obtained via QUAD, in situations where PQL is known to give reasonable results. PMID:24416249
NASA Astrophysics Data System (ADS)
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Domain selection for the varying coefficient model via local polynomial regression
Kong, Dehan; Bondell, Howard; Wu, Yichao
2014-01-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example. PMID:25506112
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
NASA Astrophysics Data System (ADS)
Huang, Fengzhen; Li, Jingzhen; Cao, Jun
2015-02-01
Temporally and Spatially Modulated Fourier Transform Imaging Spectrometer (TSMFTIS) is a new imaging spectrometer without moving mirrors and slits. As applied in remote sensing, TSMFTIS needs to rely on push-broom of the flying platform to obtain the interferogram of the target detected, and if the moving state of the flying platform changed during the imaging process, the target interferogram picked up from the remote sensing image sequence will deviate from the ideal interferogram, then the target spectrum recovered shall not reflect the real characteristic of the ground target object. Therefore, in order to achieve a high precision spectrum recovery of the target detected, the geometry position of the target point on the TSMFTIS image surface can be calculated in accordance with the sub-pixel image registration method, and the real point interferogram of the target can be obtained with image interpolation method. The core idea of the interpolation methods (nearest, bilinear and cubic etc) are to obtain the grey value of the point to be interpolated by weighting the grey value of the pixel around and with the kernel function constructed by the distance between the pixel around and the point to be interpolated. This paper adopts the gauss-based kernel regression mode, present a kernel function that consists of the grey information making use of the relative deviation and the distance information, then the kernel function is controlled by the deviation degree between the grey value of the pixel around and the means value so as to adjust weights self adaptively. The simulation adopts the partial spectrum data obtained by the pushbroom hyperspectral imager (PHI) as the spectrum of the target, obtains the successively push broomed motion error image in combination with the related parameter of the actual aviation platform; then obtains the interferogram of the target point with the above interpolation method; finally, recovers spectrogram with the nonuniform fast
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Yap, C W; Li, H; Ji, Z L; Chen, Y Z
2007-11-01
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models have been extensively used for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological property from structure-derived physicochemical and structural features. These models can be developed by using various regression methods including conventional approaches (multiple linear regression and partial least squares) and more recently explored genetic (genetic function approximation) and machine learning (k-nearest neighbour, neural networks, and support vector regression) approaches. This article describes the algorithms of these methods, evaluates their advantages and disadvantages, and discusses the application potential of the recently explored methods. Freely available online and commercial software for these regression methods and the areas of their applications are also presented. PMID:18045213
Wang, Molin; Kuchiba, Aya; Ogino, Shuji
2015-01-01
In interdisciplinary biomedical, epidemiologic, and population research, it is increasingly necessary to consider pathogenesis and inherent heterogeneity of any given health condition and outcome. As the unique disease principle implies, no single biomarker can perfectly define disease subtypes. The complex nature of molecular pathology and biology necessitates biostatistical methodologies to simultaneously analyze multiple biomarkers and subtypes. To analyze and test for heterogeneity hypotheses across subtypes defined by multiple categorical and/or ordinal markers, we developed a meta-regression method that can utilize existing statistical software for mixed-model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by 1 marker while controlling for other markers and to evaluate whether the difference in exposure-subtype association across subtypes defined by 1 marker depends on any other markers. To illustrate this method in molecular pathological epidemiology research, we examined the associations between smoking status and colorectal cancer subtypes defined by 3 correlated tumor molecular characteristics (CpG island methylator phenotype, microsatellite instability, and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation) in the Nurses' Health Study (1980–2010) and the Health Professionals Follow-up Study (1986–2010). This method can be widely useful as molecular diagnostics and genomic technologies become routine in clinical medicine and public health. PMID:26116215
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Accurate motion parameter estimation for colonoscopy tracking using a regression method
NASA Astrophysics Data System (ADS)
Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.
2010-03-01
Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.
Toplak, Marko; Močnik, Rok; Polajnar, Matija; Bosnić, Zoran; Carlsson, Lars; Hasselgren, Catrin; Demšar, Janez; Boyer, Scott; Zupan, Blaž; Stålring, Jonna
2014-02-24
The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite. PMID:24490838
NCAA Penalizes Fewer Teams than Expected
ERIC Educational Resources Information Center
Sander, Libby
2008-01-01
This article reports that the National Collegiate Athletic Association (NCAA) has penalized fewer teams than it expected this year over athletes' poor academic performance. For years, officials with the NCAA have predicted that strikingly high numbers of college sports teams could be at risk of losing scholarships this year because of their…
Environmental Conditions in Kentucky's Penal Institutions
ERIC Educational Resources Information Center
Bell, Irving
1974-01-01
A state task force was organized to identify health or environmental deficiencies existing in Kentucky penal institutions. Based on information gained through direct observation and inmate questionnaires, the task force concluded that many hazardous and unsanitary conditions existed, and recommended that immediate action be given to these…
Zheng, R; Zhang, W; Li, Y; Huang, J; Yang, D
1998-02-01
The EDXRF extrapolate-regression method described in this paper combines regression method with the fundamental formula of fluorescence intensity. The contents of Ni and Pd in white karat gold jewellery were calculated theoretically according to the spectrum of the sample. The content of gold was deternined without standards. The precision was 0.1% and the deviation was 0.3% compared with AA. PMID:15810348
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. PMID:20645279
Investigating the Accuracy of Three Estimation Methods for Regression Discontinuity Design
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2013-01-01
Regression discontinuity design is an alternative to randomized experiments to make causal inference when random assignment is not possible. This article first presents the formal identification and estimation of regression discontinuity treatment effects in the framework of Rubin's causal model, followed by a thorough literature review of…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Penalized Spline: a General Robust Trajectory Model for ZIYUAN-3 Satellite
NASA Astrophysics Data System (ADS)
Pan, H.; Zou, Z.
2016-06-01
Owing to the dynamic imaging system, the trajectory model plays a very important role in the geometric processing of high resolution satellite imagery. However, establishing a trajectory model is difficult when only discrete and noisy data are available. In this manuscript, we proposed a general robust trajectory model, the penalized spline model, which could fit trajectory data well and smooth noise. The penalized parameter λ controlling the smooth and fitting accuracy could be estimated by generalized cross-validation. Five other trajectory models, including third-order polynomials, Chebyshev polynomials, linear interpolation, Lagrange interpolation and cubic spline, are compared with the penalized spline model. Both the sophisticated ephemeris and on-board ephemeris are used to compare the orbit models. The penalized spline model could smooth part of noise, and accuracy would decrease as the orbit length increases. The band-to-band misregistration of ZiYuan-3 Dengfeng and Faizabad multispectral images is used to evaluate the proposed method. With the Dengfeng dataset, the third-order polynomials and Chebyshev approximation could not model the oscillation, and introduce misregistration of 0.57 pixels misregistration in across-track direction and 0.33 pixels in along-track direction. With the Faizabad dataset, the linear interpolation, Lagrange interpolation and cubic spline model suffer from noise, introducing larger misregistration than the approximation models. Experimental results suggest the penalized spline model could model the oscillation and smooth noise.
Eng, K.; Milly, P.C.D.; Tasker, Gary D.
2007-01-01
To facilitate estimation of streamflow characteristics at an ungauged site, hydrologists often define a region of influence containing gauged sites hydrologically similar to the estimation site. This region can be defined either in geographic space or in the space of the variables that are used to predict streamflow (predictor variables). These approaches are complementary, and a combination of the two may be superior to either. Here we propose a hybrid region-of-influence (HRoI) regression method that combines the two approaches. The new method was applied with streamflow records from 1,091 gauges in the southeastern United States to estimate the 50-year peak flow (Q50). The HRoI approach yielded lower root-mean-square estimation errors and produced fewer extreme errors than either the predictor-variable or geographic region-of-influence approaches. It is concluded, for Q50 in the study region, that similarity with respect to the basin characteristics considered (area, slope, and annual precipitation) is important, but incomplete, and that the consideration of geographic proximity of stations provides a useful surrogate for characteristics that are not included in the analysis. ?? 2007 ASCE.
Comparing implementations of penalized weighted least-squares sinogram restoration
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-01-01
Purpose: A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. Methods: The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix
Comparing implementations of penalized weighted least-squares sinogram restoration
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-11-15
Purpose: A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. Methods: The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix
NASA Astrophysics Data System (ADS)
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ≈90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
Law, G.S.; Tasker, Gary D.
2003-01-01
The region-of-influence method and regional-regression equations are used to predict flood frequency of unregulated and ungaged rivers and streams of Tennessee. The prediction methods have been developed using strem-gage records from unregulated streams draining basins having 1-30% total impervious area. A computer application automates the calculation of the flood frequencies of the unregulated streams. Average deleted-residual prediction errors for the region-of-influence method are found to be slightly smaller than those for the regional regression methods.
1989-01-01
Austria's Federal act amending the Penal Code and the Code of Penal Procedure (Penal Code Amendments 1989), April 27, 1989, rewrites sections of the Penal Code relating to sexual crimes. Among other things, it makes these sections sex-neutral and criminalizes rape within marriage and cohabitation. Section 201 states that 1) whoever, by means of serious force or threat of actual serious danger to life or limb, compels a person to engage in sexual intercourse or an equivalent sexual act will be punished with imprisonment from 1 to 10 years. Rendering a person unconscious will be considered using serious force; 2) apart from the above subsection 1, whoever, by means of force or deprivation or personal freedom, or threat of actual danger to life or limb, compels a person to engage in sexual intercourse or an equivalent sexual act will be punished with imprisonment from 6 months to 5 years; and 3) specified circumstances will result in enhanced punishments. Section 202 states that 1) apart from the above Section 201, whoever by means of force or serious threat, compels a sexual act shall be punished with imprisonment for up to 3 years and 2) there will be enhanced punishments for special circumstances. Section 203 deals with perpetration of the crime in marriage or cohabitation, and states: 1) whoever perpetrates one of the acts described in Section 201 and Section 202 against a spouse or cohabiting partner will be prosecuted only upon the complaint of the injured party in so far as none of the results described in sections 201 or 202 occurs, and the criminal act contains none of the circumstances specified in those sections. Special commutation provisions are available when the injured party declares their wish to continue to live with the perpetrator. PMID:12344063
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326
Stefanello, C; Vieira, S L; Xue, P; Ajuwon, K M; Adeola, O
2016-07-01
A study was conducted to determine the ileal digestible energy (IDE), ME, and MEn contents of bakery meal using the regression method and to evaluate whether the energy values are age-dependent in broiler chickens from zero to 21 d post hatching. Seven hundred and eighty male Ross 708 chicks were fed 3 experimental diets in which bakery meal was incorporated into a corn-soybean meal-based reference diet at zero, 100, or 200 g/kg by replacing the energy-yielding ingredients. A 3 × 3 factorial arrangement of 3 ages (1, 2, or 3 wk) and 3 dietary bakery meal levels were used. Birds were fed the same experimental diets in these 3 evaluated ages. Birds were grouped by weight into 10 replicates per treatment in a randomized complete block design. Apparent ileal digestibility and total tract retention of DM, N, and energy were calculated. Expression of mucin (MUC2), sodium-dependent phosphate transporter (NaPi-IIb), solute carrier family 7 (cationic amino acid transporter, Y(+) system, SLC7A2), glucose (GLUT2), and sodium-glucose linked transporter (SGLT1) genes were measured at each age in the jejunum by real-time PCR. Addition of bakery meal to the reference diet resulted in a linear decrease in retention of DM, N, and energy, and a quadratic reduction (P < 0.05) in N retention and ME. There was a linear increase in DM, N, and energy as birds' ages increased from 1 to 3 wk. Dietary bakery meal did not affect jejunal gene expression. Expression of genes encoding MUC2, NaPi-IIb, and SLC7A2 linearly increased (P < 0.05) with age. Regression-derived MEn of bakery meal linearly increased (P < 0.05) as the age of birds increased, with values of 2,710, 2,820, and 2,923 kcal/kg DM for 1, 2, and 3 wk, respectively. Based on these results, utilization of energy and nitrogen in the basal diet decreased when bakery meal was included and increased with age of broiler chickens. PMID:26944962
Robust Gaussian Graphical Modeling via l1 Penalization
Sun, Hokeun; Li, Hongzhe
2012-01-01
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Bayesian Method for Support Union Recovery in Multivariate Multi-Response Linear Regression
NASA Astrophysics Data System (ADS)
Chen, Wan-Ping
Sparse modeling has become a particularly important and quickly developing topic in many applications of statistics, machine learning, and signal processing. The main objective of sparse modeling is discovering a small number of predictive patterns that would improve our understanding of the data. This paper extends the idea of sparse modeling to the variable selection problem in high dimensional linear regression, where there are multiple response vectors, and they share the same or similar subsets of predictor variables to be selected from a large set of candidate variables. In the literature, this problem is called multi-task learning, support union recovery or simultaneous sparse coding in different contexts. We present a Bayesian method for solving this problem by introducing two nested sets of binary indicator variables. In the first set of indicator variables, each indicator is associated with a predictor variable or a regressor, indicating whether this variable is active for any of the response vectors. In the second set of indicator variables, each indicator is associated with both a predicator variable and a response vector, indicating whether this variable is active for the particular response vector. The problem of variable selection is solved by sampling from the posterior distributions of the two sets of indicator variables. We develop a Gibbs sampling algorithm for posterior sampling and use the generated samples to identify active support both in shared and individual level. Theoretical and simulation justification are performed in the paper. The proposed algorithm is also demonstrated on the real image data sets. To learn the patterns of the object in images, we treat images as the different tasks. Through combining images with the object in the same category, we cannot only learn the shared patterns efficiently but also get individual sketch of each image.
Revisiting the Distance Duality Relation using a non-parametric regression method
NASA Astrophysics Data System (ADS)
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
A primer on regression methods for decoding cis-regulatory logic
Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.
2009-03-03
The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain
Analyzing Association Mapping in Pedigree-Based GWAS Using a Penalized Multitrait Mixed Model.
Liu, Jin; Yang, Can; Shi, Xingjie; Li, Cong; Huang, Jian; Zhao, Hongyu; Ma, Shuangge
2016-07-01
Genome-wide association studies (GWAS) have led to the identification of many genetic variants associated with complex diseases in the past 10 years. Penalization methods, with significant numerical and statistical advantages, have been extensively adopted in analyzing GWAS. This study has been partly motivated by the analysis of Genetic Analysis Workshop (GAW) 18 data, which have two notable characteristics. First, the subjects are from a small number of pedigrees and hence related. Second, for each subject, multiple correlated traits have been measured. Most of the existing penalization methods assume independence between subjects and traits and can be suboptimal. There are a few methods in the literature based on mixed modeling that can accommodate correlations. However, they cannot fully accommodate the two types of correlations while conducting effective marker selection. In this study, we develop a penalized multitrait mixed modeling approach. It accommodates the two different types of correlations and includes several existing methods as special cases. Effective penalization is adopted for marker selection. Simulation demonstrates its satisfactory performance. The GAW 18 data are analyzed using the proposed method. PMID:27247027
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes. PMID:24416092
Nonlinear regression-based method for pseudoenhancement correction in CT colonography.
Tsagaan, Baigalmaa; Näppi, Janne; Yoshida, Hiroyuki
2009-08-01
In CT colonography (CTC), orally administered positive-contrast tagging agents are often used for differentiating residual bowel contents from native colonic structures. However, tagged materials can sometimes hyperattenuate observed CT numbers of their adjacent untagged materials. Such pseudoenhancement complicates the differentiation of colonic soft-tissue structures from tagged materials, because pseudoenhanced colonic structures may have CT numbers that are similar to those of tagged materials. The authors developed a nonlinear regression-based (NLRB) method for performing a local image-based pseudoenhancement correction of CTC data. To calibrate the correction parameters, the CT data of an anthropomorphic reference phantom were correlated with those of partially tagged phantoms. The CTC data were registered spatially by use of an adaptive multiresolution method, and untagged and tagged partial-volume soft-tissue surfaces were correlated by use of a virtual tagging scheme. The NLRB method was then optimized to minimize the difference in the CT numbers of soft-tissue regions between the untagged and tagged phantom CTC data by use of the Nelder-Mead downhill simplex method. To validate the method, the CT numbers of untagged regions were compared with those of registered pseudoenhanced phantom regions before and after the correction. The CT numbers were significantly different before performing the correction (p<0.01), whereas, after the correction, the difference between the CT numbers was not significant. The effect of the correction was also tested on the size measurement of polyps that were covered by tagging in phantoms and in clinical cases. In phantom cases, before the correction, the diameters of 12 simulated polyps submerged in tagged fluids that were measured in a soft-tissue CT display were significantly different from those measured in an untagged phantom (p<0.01), whereas after the correction the difference was not significant. In clinical cases
A penalized robust semiparametric approach for gene-environment interactions.
Wu, Cen; Shi, Xingjie; Cui, Yuehua; Ma, Shuangge
2015-12-30
In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G×E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to accommodate possible data contamination. Penalization, which has been extensively used with high-dimensional data, is adopted for selection. The proposed penalized estimation approach can automatically determine if a G factor has an interaction with an E factor, main effect but not interaction, or no effect at all. The proposed approach can be effectively realized using a coordinate descent algorithm. Simulation shows that it has satisfactory performance and outperforms several competing alternatives. The proposed approach is used to analyze a lung cancer study with gene expression measurements and clinical variables. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26239060
Estimating R-squared Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods.
ERIC Educational Resources Information Center
Yin, Ping; Fan, Xitao
2001-01-01
Studied the effectiveness of various analytical formulas for estimating "R" squared shrinkage in multiple regression analysis, focusing on estimators of the squared population multiple correlation coefficient and the squared population cross validity coefficient. Simulation results suggest that the most widely used Wherry (R. Wherry, 1931) formula…
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method
ERIC Educational Resources Information Center
Rutkowski, Leslie; Zhou, Yan
2015-01-01
Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
NASA Astrophysics Data System (ADS)
Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min. PMID:25266685
Comparison of regression and time-series methods for synthesizing missing streamflow records
Beauchamp, J.J.; Downing, D.J.; Railsback, S.F. )
1989-10-01
Regression and time-series techniques have been used to synthesize and predict the stream flow at the Foresta Bridge gage from information at the upstream Pohono Bridge gage on the Merced River near Yosemite National Park. Using the available data from two time periods (calendar year 1979 and water year 1986), the authors evaluated the two techniques in their ability to model the variation in the observed flows and in their ability to predict stream flow at the Foresta Bridge gage for the 1979 time period with data from the 1986 time period. Both techniques produced reasonably good estimates and forecasts of the flow at the downstream gage. However, the regression model was found to have a significant amount of autocorrelation in the residuals, which the time-series model was able to eliminate. The time-series technique presented can be of great assistance in arriving at reasonable estimates of flow in data sets that have large missing portions of data.
Solution of the linear regression problem using matrix correction methods in the l 1 metric
NASA Astrophysics Data System (ADS)
Gorelik, V. A.; Trembacheva (Barkalova), O. S.
2016-02-01
The linear regression problem is considered as an improper interpolation problem. The metric l 1 is used to correct (approximate) all the initial data. A probabilistic justification of this metric in the case of the exponential noise distribution is given. The original improper interpolation problem is reduced to a set of a finite number of linear programming problems. The corresponding computational algorithms are implemented in MATLAB.
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models
Elliott, Michael R.
2012-01-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create “data driven” weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical. PMID:23275683
Bolarinwa, O A; Adeola, O
2012-12-01
Digestible and metabolizable energy contents of feed ingredients for pigs can be determined by direct or indirect methods. There are situations when only the indirect approach is suitable and the regression method is a robust indirect approach. This study was conducted to compare the direct and regression methods for determining the energy value of wheat for pigs. Twenty-four barrows with an average initial BW of 31 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g wheat/kg plus minerals and vitamins (sole wheat) for the direct method, corn (Zea mays)-soybean (Glycine max) meal reference diet (RD), RD + 300 g wheat/kg, and RD + 600 g wheat/kg. The 3 corn-soybean meal diets were used for the regression method and wheat replaced the energy-yielding ingredients, corn and soybean meal, so that the same ratio of corn and soybean meal across the experimental diets was maintained. The wheat used was analyzed to contain 883 g DM, 15.2 g N, and 3.94 Mcal GE/kg. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d total but separate collection of feces and urine. The DE and ME for the sole wheat diet were 3.83 and 3.77 Mcal/kg DM, respectively. Because the sole wheat diet contained 969 g wheat/kg, these translate to 3.95 Mcal DE/kg DM and 3.89 Mcal ME/kg DM. The RD used for the regression approach yielded 4.00 Mcal DE and 3.91 Mcal ME/kg DM diet. Increasing levels of wheat in the RD linearly reduced (P < 0.05) DE and ME to 3.88 and 3.79 Mcal/kg DM diet, respectively. The regressions of wheat contribution to DE and ME in megacalories against the quantity of wheat DM intake in kilograms generated 3.96 Mcal DE and 3.88 Mcal ME/kg DM. In conclusion, values obtained for the DE and ME of wheat using the direct method (3.95 and 3.89 Mcal/kg DM) did not differ (0.78 < P < 0.89) from those obtained using the regression method (3.96 and 3.88 Mcal/kg DM). PMID:23365389
Regression to fuzziness method for estimation of remaining useful life in power plant components
NASA Astrophysics Data System (ADS)
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
The cross politics of Ecuador's penal state.
Garces, Chris
2010-01-01
This essay examines inmate "crucifixion protests" in Ecuador's largest prison during 2003-04. It shows how the preventively incarcerated-of whom there are thousands-managed to effectively denounce their extralegal confinement by embodying the violence of the Christian crucifixion story. This form of protest, I argue, simultaneously clarified and obscured the multiple layers of sovereign power that pressed down on urban crime suspects, who found themselves persecuted and forsaken both outside and within the space of the prison. Police enacting zero-tolerance policies in urban neighborhoods are thus a key part of the penal state, as are the politically threatened family members of the indicted, the sensationalized local media, distrustful neighbors, prison guards, and incarcerated mafia. The essay shows how the politico-theological performance of self-crucifixion responded to these internested forms of sovereign violence, and were briefly effective. The inmates' cross intervention hence provides a window into the way sovereignty works in the Ecuadorean penal state, drawing out how incarceration trends and new urban security measures interlink, and produce an array of victims. PMID:20662147
[Legal probation of juvenile offenders after release from penal reformative training].
Urbaniok, Frank; Rossegger, Astrid; Fegert, Jörg; Rubertus, Michael; Endrass, Jérôme
2007-01-01
Over recent years, there has been an increase in adolescent delinquency in Germany and Switzerland. In this context, the episodic character of the majority of adolescent delinquency is usually pointed out; however, numerous studies show high re-offending rates for released adolescents. The goal of this study is to examine the legal probation of juvenile delinquents after release from penal reformative training. In this study, the legal probation of adolescents committed to the AEA Uitikon, in the Canton of Zurich, between 1974 and 1986 was scrutinized by examining extracts from their criminal record as of 2003. The period of catamnesis was thus between 17 and 29 years. Overall, 71% of offenders reoffended, 29% with a violent or sexual offence. Bivariate logistic regression showed that the kind of offence committed had no influence on the probability of recidivism. If commitment to the AEA was due to a single offence (as opposed to serial offences), the risk of recidivism was reduced by 71% (OR=0.29). The results of the study show that young delinquents sentenced and committed to penal reformative training have a high recidivism risk. Furthermore, the results point out the importance of the evaluation of the offense-preventive efficacy of penal measures. PMID:17410929
Şentürk, Damla; Dalrymple, Lorien S.; Mu, Yi; Nguyen, Danh V.
2014-01-01
SUMMARY We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (a) the probability of cardiovascular events, a binary process and (b) the rate of events once the realization is positive - when the ‘hurdle’ is crossed - using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V
2014-11-10
We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Heinze, Georg; Ploner, Meinhard; Beyea, Jan
2013-12-20
In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H., Jr.
2015-01-01
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Using LASSO Regression to Predict Rheumatoid Arthritis Treatment Efficacy
Odgers, David J.; Tellis, Natalie; Hall, Heather; Dumontier, Michel
2016-01-01
Rheumatoid arthritis (RA) accounts for one-fifth of the deaths due to arthritis, the leading cause of disability in the United States. Finding effective treatments for managing arthritis symptoms are a major challenge, since the mechanisms of autoimmune disorders are not fully understood and disease presentation differs for each patient. The American College of Rheumatology clinical guidelines for treatment consider the severity of the disease when deciding treatment, but do not include any prediction of drug efficacy. Using Electronic Health Records and Biomedical Linked Open Data (LOD), we demonstrate a method to classify patient outcomes using LASSO penalized regression. We show how Linked Data improves prediction and provides insight into how drug treatment regimes have different treatment outcome. Applying classifiers like this to decision support in clinical applications could decrease time to successful disease management, lessening a physical and financial burden on patients individually and the healthcare system as a whole. PMID:27570666
Huang, Hai-Hui; Liu, Xiao-Ying; Liang, Yong
2016-01-01
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods. PMID:27136190
Pavlou, Menelaos; Ambler, Gareth; Seaman, Shaun; De Iorio, Maria; Omar, Rumana Z
2016-03-30
Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low-dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered-data settings and to incorporate external information. © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd. PMID:26514699
NASA Astrophysics Data System (ADS)
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.
1998-01-01
A key challenge in designing the new High Speed Civil Transport (HSCT) aircraft is determining a good match between the airframe and engine. Multidisciplinary design optimization can be used to solve the problem by adjusting parameters of both the engine and the airframe. Earlier, an example problem was presented of an HSCT aircraft with four mixed-flow turbofan engines and a baseline mission to carry 305 passengers 5000 nautical miles at a cruise speed of Mach 2.4. The problem was solved by coupling NASA Lewis Research Center's design optimization testbed (COMETBOARDS) with NASA Langley Research Center's Flight Optimization System (FLOPS). The computing time expended in solving the problem was substantial, and the instability of the FLOPS analyzer at certain design points caused difficulties. In an attempt to alleviate both of these limitations, we explored the use of two approximation concepts in the design optimization process. The two concepts, which are based on neural network and linear regression approximation, provide the reanalysis capability and design sensitivity analysis information required for the optimization process. The HSCT aircraft optimization problem was solved by using three alternate approaches; that is, the original FLOPS analyzer and two approximate (derived) analyzers. The approximate analyzers were calibrated and used in three different ranges of the design variables; narrow (interpolated), standard, and wide (extrapolated).
Du, Hongying; Hu, Zhide; Bazzoli, Andrea; Zhang, Yang
2011-01-01
The epidermal growth factor receptor (EGFR) protein tyrosine kinase (PTK) is an important protein target for anti-tumor drug discovery. To identify potential EGFR inhibitors, we conducted a quantitative structure–activity relationship (QSAR) study on the inhibitory activity of a series of quinazoline derivatives against EGFR tyrosine kinase. Two 2D-QSAR models were developed based on the best multi-linear regression (BMLR) and grid-search assisted projection pursuit regression (GS-PPR) methods. The results demonstrate that the inhibitory activity of quinazoline derivatives is strongly correlated with their polarizability, activation energy, mass distribution, connectivity, and branching information. Although the present investigation focused on EGFR, the approach provides a general avenue in the structure-based drug development of different protein receptor inhibitors. PMID:21811593
NASA Astrophysics Data System (ADS)
Arioli, M.; Gratton, S.
2012-11-01
Minimum-variance unbiased estimates for linear regression models can be obtained by solving least-squares problems. The conjugate gradient method can be successfully used in solving the symmetric and positive definite normal equations obtained from these least-squares problems. Taking into account the results of Golub and Meurant (1997, 2009) [10,11], Hestenes and Stiefel (1952) [17], and Strakoš and Tichý (2002) [16], which make it possible to approximate the energy norm of the error during the conjugate gradient iterative process, we adapt the stopping criterion introduced by Arioli (2005) [18] to the normal equations taking into account the statistical properties of the underpinning linear regression problem. Moreover, we show how the energy norm of the error is linked to the χ2-distribution and to the Fisher-Snedecor distribution. Finally, we present the results of several numerical tests that experimentally validate the effectiveness of our stopping criteria.
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies. PMID:25302582
A Probabilistic Spatial Dengue Fever Risk Assessment by a Threshold-Based-Quantile Regression Method
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies. PMID:25302582
Xie, Benhuai; Pan, Wei; Shen, Xiaotong
2010-01-01
Motivation: Model-based clustering has been widely used, e.g. in microarray data analysis. Since for high-dimensional data variable selection is necessary, several penalized model-based clustering methods have been proposed tørealize simultaneous variable selection and clustering. However, the existing methods all assume that the variables are independent with the use of diagonal covariance matrices. Results: To model non-independence of variables (e.g. correlated gene expressions) while alleviating the problem with the large number of unknown parameters associated with a general non-diagonal covariance matrix, we generalize the mixture of factor analyzers to that with penalization, which, among others, can effectively realize variable selection. We use simulated data and real microarray data to illustrate the utility and advantages of the proposed method over several existing ones. Contact: weip@biostat.umn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20031967
Deng, Zhaohong; Choi, Kup-Sze; Jiang, Yizhang; Wang, Shitong
2014-12-01
Inductive transfer learning has attracted increasing attention for the training of effective model in the target domain by leveraging the information in the source domain. However, most transfer learning methods are developed for a specific model, such as the commonly used support vector machine, which makes the methods applicable only to the adopted models. In this regard, the generalized hidden-mapping ridge regression (GHRR) method is introduced in order to train various types of classical intelligence models, including neural networks, fuzzy logical systems and kernel methods. Furthermore, the knowledge-leverage based transfer learning mechanism is integrated with GHRR to realize the inductive transfer learning method called transfer GHRR (TGHRR). Since the information from the induced knowledge is much clearer and more concise than that from the data in the source domain, it is more convenient to control and balance the similarity and difference of data distributions between the source and target domains. The proposed GHRR and TGHRR algorithms have been evaluated experimentally by performing regression and classification on synthetic and real world datasets. The results demonstrate that the performance of TGHRR is competitive with or even superior to existing state-of-the-art inductive transfer learning algorithms. PMID:24710838
NASA Astrophysics Data System (ADS)
Mathon, Bree R.; Ozbek, Metin M.; Pinder, George F.
2008-05-01
SummaryTraditionally the Cooper-Jacob equation is used to determine the transmissivity and the storage coefficient for an aquifer using pump test results. This model, however, is a simplified version of the actual subsurface and does not allow for analysis of the uncertainty that comes from a lack of knowledge about the heterogeneity of the environment under investigation. In this paper, a modified fuzzy least-squares regression (MFLSR) method is developed that uses imprecise pump test data to obtain fuzzy intercept and slope values which are then used in the Cooper-Jacob method. Fuzzy membership functions for the transmissivity and the storage coefficient are then calculated using the extension principle. The supports of the fuzzy membership functions incorporate the transmissivity and storage coefficient values that would be obtained using ordinary least-squares regression and the Cooper-Jacob method. The MFLSR coupled with the Cooper-Jacob method allows the analyst to ascertain the uncertainty that is inherent in the estimated parameters obtained using the simplified Cooper-Jacob method and data that are uncertain due to lack of knowledge regarding the heterogeneity of the aquifer.
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
ERIC Educational Resources Information Center
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test
Kim, Jinseog; Kim, Dennis (Dong Hwan); Jung, Sin-Ho
2013-01-01
One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity. PMID:24174989
NASA Astrophysics Data System (ADS)
Borodachev, S. M.
2016-06-01
The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Boundary integral equation method calculations of surface regression effects in flame spreading
NASA Technical Reports Server (NTRS)
Altenkirch, R. A.; Rezayat, M.; Eichhorn, R.; Rizzo, F. J.
1982-01-01
A solid-phase conduction problem that is a modified version of one that has been treated previously in the literature and is applicable to flame spreading over a pyrolyzing fuel is solved using a boundary integral equation (BIE) method. Results are compared to surface temperature measurements that can be found in the literature. In addition, the heat conducted through the solid forward of the flame, the heat transfer responsible for sustaining the flame, is also computed in terms of the Peclet number based on a heated layer depth using the BIE method and approximate methods based on asymptotic expansions. Agreement between computed and experimental results is quite good as is agreement between the BIE and the approximate results.
Technology Transfer Automated Retrieval System (TEKTRAN)
The beard testing method for measuring cotton fiber length is based on the fibrogram theory. However, in the instrumental implementations, the engineering complexity alters the original fiber length distribution observed by the instrument. This causes challenges in obtaining the entire original le...
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
NASA Astrophysics Data System (ADS)
Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.
2014-03-01
We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2015-07-01
In operational hydrology, estimation of the predictive uncertainty of hydrological models used for flood modelling is essential for risk-based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analysing and predicting uncertainty. However, studies devoted to comparing the performance of the methods in predicting uncertainty are limited. This paper focuses on the methods predicting model residual uncertainty that differ in methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC). The comparison of the methods is aimed at investigating how well a simpler method using fewer input data performs over a more complex method with more predictors. We test these two methods on several catchments from the UK that vary in hydrological characteristics and the models used. Special attention is given to the methods' performance under different hydrological conditions. Furthermore, normality of model residuals in data clusters (identified by UNEEC) is analysed. It is found that basin lag time and forecast lead time have a large impact on the quantification of uncertainty and the presence of normality in model residuals' distribution. In general, it can be said that both methods give similar results. At the same time, it is also shown that the UNEEC method provides better performance than QR for small catchments with the changing hydrological dynamics, i.e. rapid response catchments. It is recommended that more case studies of catchments of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features, be considered.
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface
Stojić, Andreja; Maletić, Dimitrije; Stanišić Stojić, Svetlana; Mijić, Zoran; Šoštarić, Andrej
2015-07-15
In this study, advanced multivariate methods were applied for VOC source apportionment and subsequent short-term forecast of industrial- and vehicle exhaust-related contributions in Belgrade urban area (Serbia). The VOC concentrations were measured using PTR-MS, together with inorganic gaseous pollutants (NOx, NO, NO2, SO2, and CO), PM10, and meteorological parameters. US EPA Positive Matrix Factorization and Unmix receptor models were applied to the obtained dataset both resolving six source profiles. For the purpose of forecasting industrial- and vehicle exhaust-related source contributions, different multivariate methods were employed in two separate cases, relying on meteorological data, and on meteorological data and concentrations of inorganic gaseous pollutants, respectively. The results indicate that Boosted Decision Trees and Multi-Layer Perceptrons were the best performing methods. According to the results, forecasting accuracy was high (lowest relative error of only 6%), in particular when the forecast was based on both meteorological parameters and concentrations of inorganic gaseous pollutants. PMID:25828408
NASA Astrophysics Data System (ADS)
Qu, Yonghua; Jiao, Siong; Lin, Xudong
2008-10-01
Hetao Irrigation District located in Inner Mongolia, is one of the three largest irrigated area in China. In the irrigational agriculture region, for the reasons that many efforts have been put on irrigation rather than on drainage, as a result much sedimentary salt that usually is solved in water has been deposited in surface soil. So there has arisen a problem in such irrigation district that soil salinity has become a chief fact which causes land degrading. Remote sensing technology is an efficiency way to map the salinity in regional scale. In the principle of remote sensing, soil spectrum is one of the most important indications which can be used to reflect the status of soil salinity. In the past decades, many efforts have been made to reveal the spectrum characteristics of the salinized soil, such as the traditional statistic regression method. But it also has been found that when the hyper-spectral reflectance data are considered, the traditional regression method can't be treat the large dimension data, because the hyper-spectral data usually have too higher spectral band number. In this paper, a partial least squares regression (PLSR) model was established based on the statistical analysis on the soil salinity and the reflectance of hyper-spectral. Dataset were collect through the field soil samples were collected in the region of Hetao irrigation from the end of July to the beginning of August. The independent validation using data which are not included in the calibration model reveals that the proposed model can predicate the main soil components such as the content of total ions(S%), PH with higher determination coefficients(R2) of 0.728 and 0.715 respectively. And the rate of prediction to deviation(RPD) of the above predicted value are larger than 1.6, which indicates that the calibrated PLSR model can be used as a tool to retrieve soil salinity with accurate results. When the PLSR model's regression coefficients were aggregated according to the
Yang, Jianli; Bai, Yang; Li, Guojun; Liu, Ming; Liu, Xiuling
2015-01-01
Premature ventricular contraction (PVC) is one of the most serious arrhythmias. Without early diagnosis and proper treatment, PVC can result in significant complications. In this paper, a novel feature extraction method based on a sparse auto-encoder (SAE) and softmax regression (SR) classifier was used to differentiate PVCs from other common Non-PVC rhythms, including normal sinus (N), left bundle branch block (LBBB), right bundle branch block (RBBB), atrial premature contraction (APC), and paced beat (PB) rhythms. The proposed method was analyzed using 40 ECG records obtained from the MIT-BIH Arrhythmia Database. The proposed method exhibited an overall accuracy of 99.4%, with a PVC recognition sensitivity and positive predictability of 97.9% and 91.8%, respectively. PMID:26405919
NASA Astrophysics Data System (ADS)
Dogulu, N.; López López, P.; Solomatine, D. P.; Weerts, A. H.; Shrestha, D. L.
2014-09-01
In operational hydrology, estimation of predictive uncertainty of hydrological models used for flood modelling is essential for risk based decision making for flood warning and emergency management. In the literature, there exists a variety of methods analyzing and predicting uncertainty. However, case studies comparing performance of these methods, most particularly predictive uncertainty methods, are limited. This paper focuses on two predictive uncertainty methods that differ in their methodological complexity: quantile regression (QR) and UNcertainty Estimation based on local Errors and Clustering (UNEEC), aiming at identifying possible advantages and disadvantages of these methods (both estimating residual uncertainty) based on their comparative performance. We test these two methods on several catchments (from UK) that vary in its hydrological characteristics and models. Special attention is given to the errors for high flow/water level conditions. Furthermore, normality of model residuals is discussed in view of clustering approach employed within the framework of UNEEC method. It is found that basin lag time and forecast lead time have great impact on quantification of uncertainty (in the form of two quantiles) and achievement of normality in model residuals' distribution. In general, uncertainty analysis results from different case studies indicate that both methods give similar results. However, it is also shown that UNEEC method provides better performance than QR for small catchments with changing hydrological dynamics, i.e. rapid response catchments. We recommend that more case studies of catchments from regions of distinct hydrologic behaviour, with diverse climatic conditions, and having various hydrological features be tested.
Lin, Wei; Feng, Rui; Li, Hongzhe
2014-01-01
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
Strong, Mark; Oakley, Jeremy E; Brennan, Alan; Breeze, Penny
2015-07-01
Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method. PMID:25810269
NASA Astrophysics Data System (ADS)
Liu, Jiaqi; Han, Jing; Zhang, Yi; Bai, Lianfa
2015-10-01
Locally adaptive regression kernels model can describe the edge shape of images accurately and graphic trend of images integrally, but it did not consider images' color information while the color is an important element of an image. Therefore, we present a novel method of target recognition based on 3-D-color-space locally adaptive regression kernels model. Different from the general additional color information, this method directly calculate the local similarity features of 3-D data from the color image. The proposed method uses a few examples of an object as a query to detect generic objects with incompact, complex and changeable shapes. Our method involves three phases: First, calculating the novel color-space descriptors from the RGB color space of query image which measure the likeness of a voxel to its surroundings. Salient features which include spatial- dimensional and color -dimensional information are extracted from said descriptors, and simplifying them to construct a non-similar local structure feature set of the object class by principal components analysis (PCA). Second, we compare the salient features with analogous features from the target image. This comparison is done using a matrix generalization of the cosine similarity measure. Then the similar structures in the target image are obtained using local similarity structure statistical matching. Finally, we use the method of non-maxima suppression in the similarity image to extract the object position and mark the object in the test image. Experimental results demonstrate that our approach is effective and accurate in improving the ability to identify targets.
NASA Astrophysics Data System (ADS)
Bell, A. L.; Moore, J. N.; Greenwood, M. C.
2007-12-01
The Flathead River in Northwestern Montana drains the relatively pristine, high-mountain watersheds of Glacier- Waterton national parks and large wilderness areas making it an excellent test-bed for hydrologic response to climate change. Flows in the North Fork and Middle Fork of the Flathead River are relatively unmodified by humans, whereas the South Fork has a large hydroelectric reservoir (Hungry Horse) in the lower end of the basin. USGS stream gage data for the North, Middle and South forks from 1940 to 2006 were analyzed for significant trends in the timing of quantiles of flow to examine climate forcing vs. direct modification of flow from the dam. The trends in timing were analyzed for climate change influences using the PRISM model output for 1940 to 2006 for the respective basin. The analysis of trends in timing employed two linear regression methods, typical least squares estimation and robust estimation using weighted least squares. Least squares estimation is the standard method employed when performing regression analysis. The power of this method is sensitive to the violation of the assumptions of normally distributed errors with constant variance (homoscedasticity). Considering that violations of these assumptions are common in hydrologic data, robust estimation was used to preserve the desired statistical power because it is not significantly affected by non-normality or heteroscedasticity. Least squares estimated trends that were found to be significant, using a 10% significance level, were typically not significant using a robust estimation method. This could have implications for interpreting the meaning of significant trends found using the least squares estimator. Utilizing robust estimation methods for analyzing hydrologic data may allow investigators to more accurately summarize any trends.
NASA Astrophysics Data System (ADS)
Zobitz, J. M.; Burns, S. P.; OgéE, J.; Reichstein, M.; Bowling, D. R.
2007-09-01
Separation of the net ecosystem exchange of CO2 (F) into its component fluxes of net photosynthesis (FA) and nonfoliar respiration (FR) is important in understanding the physical and environmental controls on these fluxes, and how these fluxes may respond to environmental change. In this paper, we evaluate a partitioning method based on a combination of stable isotopes of CO2 and Bayesian optimization in the context of partitioning methods based on regressions with environmental variables. We combined high-resolution measurements of stable carbon isotopes of CO2, ecosystem fluxes, and meteorological variables with a Bayesian parameter optimization approach to estimate FA and FR in a subalpine forest in Colorado, United States, over the course of 104 days during summer 2003. Results were generally in agreement with the independent environmental regression methods of Reichstein et al. (2005a) and Yi et al. (2004). Half-hourly posterior parameter estimates of FA and FR derived from the Bayesian/isotopic method showed a strong diurnal pattern in both, consistent with established gross photosynthesis (GEE) and total ecosystem respiration (TER) relationships. Isotope-derived FA was functionally dependent on light, but FR exhibited the expected temperature dependence only when the prior estimates for FR were temperature-based. Examination of the posterior correlation matrix revealed that the available data were insufficient to independently resolve all the Bayesian-estimated parameters in our model. This could be due to a small isotopic disequilibrium (?) between FA and FR, poor characterization of whole-canopy photosynthetic discrimination or the isotopic flux (isoflux, analogous to net ecosystem exchange of 13CO2). The positive sign of ? indicates that FA was more enriched in 13C than FR. Possible reasons for this are discussed in the context of recent literature.
Numerical modeling of flexible insect wings using volume penalization
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Joern
2012-11-01
We consider the effects of chordwise flexibility on the aerodynamic performance of insect flapping wings. We developed a numerical method for modeling viscous fluid flows past moving deformable foils. It extends on the previously reported model for flows past moving rigid wings (J Comput Phys 228, 2009). The two-dimensional Navier-Stokes equations are solved using a Fourier pseudo-spectral method with the no-slip boundary conditions imposed by the volume penalization method. The deformable wing section is modeled using a non-linear beam equation. We performed numerical simulations of heaving flexible plates. The results showed that the optimal stroke frequency, which maximizes the mean thrust, is lower than the resonant frequency, in agreement with the experiments by Ramananarivo et al. (PNAS 108(15), 2011). The oscillatory part of the force only increases in amplitude when the frequency increases, and at the optimal frequency it is about 3 times larger than the mean force. We also study aerodynamic interactions between two heaving flexible foils. This flow configuration corresponds to the wings of dragonflies. We explore the effects of the phase difference and spacing between the fore- and hind-wing.
Zhang, L; Liu, X J
2016-01-01
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations. PMID:27323111
Swartz, Michael D.; Yu, Robert K.; Shete, Sanjay
2011-01-01
When modeling the risk of a disease, the very act of selecting the factors to include can heavily impact the results. This study compares the performance of several variable selection techniques applied to logistic regression. We performed realistic simulation studies to compare five methods of variable selection: (1) a confidence interval approach for significant coefficients (CI), (2) backward selection, (3) forward selection, (4) stepwise selection, and (5) Bayesian stochastic search variable selection (SSVS) using both informed and uniformed priors. We defined our simulated diseases mimicking odds ratios for cancer risk found in the literature for environmental factors, such as smoking; dietary risk factors, such as fiber; genetic risk factors such as XPD; and interactions. We modeled the distribution of our covariates, including correlation, after the reported empirical distributions of these risk factors. We also used a null data set to calibrate the priors of the Bayesian method and evaluate its sensitivity. Of the standard methods (95% CI, backward, forward and stepwise selection) the CI approach resulted in the highest average percent of correct associations and lowest average percent of incorrect associations. SSVS with an informed prior had higher average percent of correct associations and lower average percent of incorrect associations than did the CI approach. This study shows that Bayesian methods offer a way to use prior information to both increase power and decrease false-positive results when selecting factors to model complex disease risk. PMID:18937224
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Li, J.; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models.
Michelot, Théo; Langrock, Roland; Kneib, Thomas; King, Ruth
2016-01-01
We discuss the semiparametric modeling of mark-recapture-recovery data where the temporal and/or individual variation of model parameters is explained via covariates. Typically, in such analyses a fixed (or mixed) effects parametric model is specified for the relationship between the model parameters and the covariates of interest. In this paper, we discuss the modeling of the relationship via the use of penalized splines, to allow for considerably more flexible functional forms. Corresponding models can be fitted via numerical maximum penalized likelihood estimation, employing cross-validation to choose the smoothing parameters in a data-driven way. Our contribution builds on and extends the existing literature, providing a unified inferential framework for semiparametric mark-recapture-recovery models for open populations, where the interest typically lies in the estimation of survival probabilities. The approach is applied to two real datasets, corresponding to gray herons (Ardea cinerea), where we model the survival probability as a function of environmental condition (a time-varying global covariate), and Soay sheep (Ovis aries), where we model the survival probability as a function of individual weight (a time-varying individual-specific covariate). The proposed semiparametric approach is compared to a standard parametric (logistic) regression and new interesting underlying dynamics are observed in both cases. PMID:26289495
NASA Astrophysics Data System (ADS)
Thomas, G. E.; Bardeen, C.; Benze, S.
2014-12-01
Simulations of Polar Mesospheric Cloud (PMC) brightness and ice water content (IWC) are used to develop a simple robust method for IWC retrieval from UV satellite observations. We compare model simulations of IWC with retrievals from the UV Cloud Imaging and Particle Size (CIPS) experiment on board the satellite mission Aeronomy for Ice in the Mesosphere (AIM). This instrument remotely senses scattered brightness related to the vertically-integrated ice content. Simulations from the Whole Atmosphere Community Climate Model (WACCM), a chemistry climate model, is combined with a sectional microphysics model based on the Community Aerosol and Radiation Model for Atmospheres (CARMA). The model calculates high-resolution three-dimensional size distributions of ice particles. The internal variability is due to geographic and temporal variation of temperature and dynamics, water vapor, and meteoric dust. We examine all simulations from a single model day (we chose northern summer solstice) which contains several thousand model clouds. Accurate vertical integrations of the albedo and IWC are obtained. The ice size distributions are thus based on physical principles, rather than artificial analytic distributions that are often used in retrieval algorithms from observations. Treating the model clouds as noise-free data, we apply the CIPS algorithm to retrieve cloud particle size and IWC. The inherent "errors" in the retrievals are thus estimated. The linear dependence of IWC on albedo makes possible a method to derive IWC, called the Albedo-Ice regression method, or AIR. This method potentially unifies the variety of data from various UV experiments, with the advantages of (1) removing scattering-angle bias from cloud brightness measurements,(2) providing a physically-useful parameter (IWC),(3) deriving IWC even for faint clouds of small average particle sizes, and (4) estimating the statistical uncertainty as a random error, which bypasses the need to derive particle size.
Rousselot, J M; Peslin, R; Duvivier, C
1992-07-01
A potentially useful method to monitor respiratory mechanics in artificially ventilated patients consists of analyzing the relationship between tracheal pressure (P), lung volume (V), and gas flow (V) by multiple linear regression (MLR) using a suitable model. Contrary to other methods, it does not require any particular flow waveform and, therefore, may be used with any ventilator. This approach was evaluated in three neonates and seven young children admitted into an intensive care unit for respiratory disorders of various etiologies. P and V were measured and digitized at a sampling rate of 40 Hz for periods of 20-48 s. After correction of P for the non-linear resistance of the endotracheal tube, the data were first analyzed with the usual linear monoalveolar model: P = PO + E.V + R.V where E and R are total respiratory elastance and resistance, and PO is the static recoil pressure at end-expiration. A good fit of the model to the data was seen in five of ten children. PO, E, and R were reproducible within cycles, and consistent with the patient's age and condition; the data obtained with two ventilatory modes were highly correlated. In the five instances in which the simple model did not fit the data well, they were reanalyzed with more sophisticated models allowing for mechanical non-homogeneity or for non-linearity of R or E. While several models substantially improved the fit, physiologically meaningful results were only obtained when R was allowed to change with lung volume. We conclude that the MLR method is adequate to monitor respiratory mechanics, even when the usual model is inadequate. PMID:1437330
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches. PMID:24621302
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P < 0.05) DM, ileal and total tract of DM, energy and nitrogen digestibilities and utilization. In Exp. 2, addition of A-V blend to the RD linearly increased (P < 0.001) ileal digestibilities and total tract utilization of DM, energy and nitrogen as well as IDE, ME, and MEn. Regressions of PBM-associated IDE, ME, or MEn intake in kcal against PBM intake were: IDE = 3,537x + 4.953, r(2) = 0.97; ME = 3,805x + 1.279, r(2) = 0.97; MEn = 3,278x + 0.164, r(2) = 0.90; and A-V blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124. PMID:26628339
Crime and Punishment: Are Copyright Violators Ever Penalized?
ERIC Educational Resources Information Center
Russell, Carrie
2004-01-01
Is there a Web site that keeps track of copyright Infringers and fines? Some colleagues don't believe that copyright violators are ever penalized. This question was asked by a reader in a question and answer column of "School Library Journal". Carrie Russell is the American Library Association's copyright specialist, and she will answer selected…
Education--Penal Institutions: U. S. and Europe.
ERIC Educational Resources Information Center
Kerle, Ken
Penal systems of European countries vary in educational programs and humanizing efforts. A high percentage of Soviet prisoners, many incarcerated for ideological/religious beliefs, are confined to labor colonies. All inmates are obligated to learn a trade, one of the qualifications for release being evidence of some trade skill. Swedish…
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Multivariate Regression with Calibration*
Liu, Han; Wang, Lie; Zhao, Tuo
2014-01-01
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data. PMID:21556848
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
A Small-Sample Choice of the Tuning Parameter in Ridge Regression
Boonstra, Philip S.; Mukherjee, Bhramar; Taylor, Jeremy M. G.
2015-01-01
We propose new approaches for choosing the shrinkage parameter in ridge regression, a penalized likelihood method for regularizing linear regression coefficients, when the number of observations is small relative to the number of parameters. Existing methods may lead to extreme choices of this parameter, which will either not shrink the coefficients enough or shrink them by too much. Within this “small-n, large-p” context, we suggest a correction to the common generalized cross-validation (GCV) method that preserves the asymptotic optimality of the original GCV. We also introduce the notion of a “hyperpenalty”, which shrinks the shrinkage parameter itself, and make a specific recommendation regarding the choice of hyperpenalty that empirically works well in a broad range of scenarios. A simple algorithm jointly estimates the shrinkage parameter and regression coefficients in the hyperpenalized likelihood. In a comprehensive simulation study of small-sample scenarios, our proposed approaches offer superior prediction over nine other existing methods. PMID:26985140
Bolarinwa, O A; Adeola, O
2016-02-01
Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( < 0.05) dietary DE and ME. The direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 < < 0.4) from those obtained using the regression method (3,669 vs. 3,746 and 3,593 vs. 3,647 kcal
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
Wang, Yuanjia
2011-01-01
Longitudinal data are routinely collected in biomedical research studies. A natural model describing longitudinal data decomposes an individual’s outcome as the sum of a population mean function and random subject-specific deviations. When parametric assumptions are too restrictive, methods modeling the population mean function and the random subject-specific functions nonparametrically are in demand. In some applications, it is desirable to estimate a covariance function of random subject-specific deviations. In this work, flexible yet computationally efficient methods are developed for a general class of semiparametric mixed effects models, where the functional forms of the population mean and the subject-specific curves are unspecified. We estimate nonparametric components of the model by penalized spline (P-spline, [1]), and reparametrize the random curve covariance function by a modified Cholesky decomposition [2] which allows for unconstrained estimation of a positive semidefinite matrix. To provide smooth estimates, we penalize roughness of fitted curves and derive closed form solutions in the maximization step of an EM algorithm. In addition, we present models and methods for longitudinal family data where subjects in a family are correlated and we decompose the covariance function into a subject-level source and observation-level source. We apply these methods to the multi-level Framingham Heart Study data to estimate age-specific heritability of systolic blood pressure (SBP) nonparametrically. PMID:21491474
Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection
Bradic, Jelena; Fan, Jianqing; Wang, Weiwei
2011-01-01
Summary In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples. PMID:21589849
ERIC Educational Resources Information Center
Hick, Thomas L.; Irvine, David J.
To eliminate maturation as a factor in the pretest-posttest design, pretest scores can be converted to anticipate posttest scores using grade equivalent scores from standardized tests. This conversion, known as historical regression, assumes that without specific intervention, growth will continue at the rate (grade equivalents per year of…
ERIC Educational Resources Information Center
Kromrey, Jeffrey D.; Hines, Constance V.
1996-01-01
The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
NASA Astrophysics Data System (ADS)
Demir, Begüm; Bruzzone, Lorenzo
2012-11-01
This paper presents a novel active learning (AL) technique in the context of ɛ-insensitive support vector regression (SVR) to estimate biophysical parameters from remotely sensed images. The proposed AL method aims at selecting the most informative and representative unlabeled samples which have maximum uncertainty, diversity and density assessed according to the SVR estimation rule. This is achieved on the basis of two consecutive steps that rely on the kernel kmeans clustering. In the first step the most uncertain unlabeled samples are selected by removing the most certain ones from a pool of unlabeled samples. In SVR problems, the most uncertain samples are located outside or on the boundary of the ɛ-tube of SVR, as their target values have the lowest confidence to be correctly estimated. In order to select these samples, the kernel k-means clustering is applied to all unlabeled samples together with the training samples that are not SVs, i.e., those that are inside the ɛ-tube, (non-SVs). Then, clusters with non-SVs inside are rejected, whereas the unlabeled samples contained in the remained clusters are selected as the most uncertain samples. In the second step the samples located in the high density regions in the kernel space and as much diverse as possible to each other are chosen among the uncertain samples. The density and diversity of the unlabeled samples are evaluated on the basis of their clusters' information. To this end, initially the density of each cluster is measured by the ratio of the number of samples in the cluster to the distance of its two furthest samples. Then, the highest density clusters are chosen and the medoid samples closest to the centers of the selected clusters are chosen as the most informative ones. The diversity of samples is accomplished by selecting only one sample from each selected cluster. Experiments applied to the estimation of single-tree parameters, i.e., tree stem volume and tree stem diameter, show the
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method's prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student's t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method was
Survival associated pathway identification with group Lp penalized global AUC maximization
2010-01-01
It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties [1]. We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times. PMID:20712896
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method’s prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student’s t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method
Clegg, Samuel M; Barefield, James E; Wiens, Roger C; Dyar, Melinda D; Schafer, Martha W; Tucker, Jonathan M
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
[Guideline 'Medicinal care for drug addicts in penal institutions'].
Westra, Michel; de Haan, Hein A; Arends, Marleen T; van Everdingen, Jannes J E; Klazinga, Niek S
2009-01-01
In the Netherlands, the policy on care for prisoners who are addicted to opiates is still heterogeneous. The recent guidelines entitled 'Medicinal care for drug addicts in penal institutions' should contribute towards unambiguous and more evidence-based treatment for this group. In addition, it should improve and bring the care pathways within judicial institutions and mainstream healthcare more into line with one another. Each rational course of medicinal treatment will initially be continued in the penal institution. In penal institutions the help on offer is mainly focused on abstinence from illegal drugs while at the same time limiting the damage caused to the health of the individual user. Methadone is regarded at the first choice for maintenance therapy. For patient safety, this is best given in liquid form in sealed cups of 5 mg/ml once daily in the morning. Recently a combination preparation containing buprenorphine and naloxone - a complete opiate antagonist - has become available. On discontinuation of opiate maintenance treatment intensive follow-up care is necessary. During this period there is considerable risk of a potentially lethal overdose. Detoxification should be coupled with psychosocial or medicinal intervention aimed at preventing relapse. Naltrexone is currently the only available opiate antagonist for preventing relapse. In those addicted to opiates, who also take benzodiazepines without any indication, it is strongly recommended that these be reduced and discontinued. This can be achieved by converting the regular dosage into the equivalent in diazepam and then reducing this dosage by a maximum of 25% a week. PMID:20051159
Zhu, Ying; Tan, Tuck Lee
2016-04-15
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects. PMID:26827180
NASA Astrophysics Data System (ADS)
Zhu, Ying; Tan, Tuck Lee
2016-04-01
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
Linear regression in astronomy. I
NASA Technical Reports Server (NTRS)
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
A penalization technique to model plasma facing components in a tokamak with temperature variations
NASA Astrophysics Data System (ADS)
Paredes, A.; Bufferand, H.; Ciraolo, G.; Schwander, F.; Serre, E.; Ghendrih, P.; Tamain, P.
2014-10-01
To properly address turbulent transport in the edge plasma region of a tokamak, it is mandatory to describe the particle and heat outflow on wall components, using an accurate representation of the wall geometry. This is challenging for many plasma transport codes, which use a structured mesh with one coordinate aligned with magnetic surfaces. We propose here a penalization technique that allows modeling of particle and heat transport using such structured mesh, while also accounting for geometrically complex plasma-facing components. Solid obstacles are considered as particle and momentum sinks whereas ionic and electronic temperature gradients are imposed on both sides of the obstacles along the magnetic field direction using delta functions (Dirac). Solutions exhibit plasma velocities (M=1) and temperatures fluxes at the plasma-wall boundaries that match with boundary conditions usually implemented in fluid codes. Grid convergence and error estimates are found to be in agreement with theoretical results obtained for neutral fluid conservation equations. The capability of the penalization technique is illustrated by introducing the non-collisional plasma region expected by the kinetic theory in the immediate vicinity of the interface, that is impossible when considering fluid boundary conditions. Axisymmetric numerical simulations show the efficiency of the method to investigate the large-scale transport at the plasma edge including the separatrix and in realistic complex geometries while keeping a simple structured grid.
A penalization technique to model plasma facing components in a tokamak with temperature variations
Paredes, A.; Bufferand, H.; Ciraolo, G.; Schwander, F.; Serre, E.; Ghendrih, P.; Tamain, P.
2014-10-01
To properly address turbulent transport in the edge plasma region of a tokamak, it is mandatory to describe the particle and heat outflow on wall components, using an accurate representation of the wall geometry. This is challenging for many plasma transport codes, which use a structured mesh with one coordinate aligned with magnetic surfaces. We propose here a penalization technique that allows modeling of particle and heat transport using such structured mesh, while also accounting for geometrically complex plasma-facing components. Solid obstacles are considered as particle and momentum sinks whereas ionic and electronic temperature gradients are imposed on both sides of the obstacles along the magnetic field direction using delta functions (Dirac). Solutions exhibit plasma velocities (M=1) and temperatures fluxes at the plasma–wall boundaries that match with boundary conditions usually implemented in fluid codes. Grid convergence and error estimates are found to be in agreement with theoretical results obtained for neutral fluid conservation equations. The capability of the penalization technique is illustrated by introducing the non-collisional plasma region expected by the kinetic theory in the immediate vicinity of the interface, that is impossible when considering fluid boundary conditions. Axisymmetric numerical simulations show the efficiency of the method to investigate the large-scale transport at the plasma edge including the separatrix and in realistic complex geometries while keeping a simple structured grid.
[Qualification of persons taking part in psychiatric opinion-giving in a penal trial].
Zgryzek, K
1998-01-01
Introduction of new Penal code by the Parliament brings about the necessity of conducting a detailed analysis of particular legal solutions in the code. The authors present an analysis of selected issues included in the Penal Code, referring to proof from the opinion of psychiatric experts, particularly those regarding professional qualifications of persons appointed by the court in a penal trial to assess mental health state of definite persons (a witness, a victim, the perpetrator). It was accepted that the only persons authorized the conduct psychiatric examination in a penal trial are those with at least first degree specialization in psychiatry. PMID:9816897
Zhang, Yan-Feng; Zhang, Li; Gao, Zhi-Xian; Dai, Shu-Gui
2012-01-01
Polycyclic aromatic hydrocarbons (PAHs) are ubiquitous contaminants found in the environment. Immunoassays represent useful analytical methods to complement traditional analytical procedures for PAHs. Cross-reactivity (CR) is a very useful character to evaluate the extent of cross-reaction of a cross-reactant in immunoreactions and immunoassays. The quantitative relationships between the molecular properties and the CR of PAHs were established by stepwise multiple linear regression, principal component regression and partial least square regression, using the data of two commercial enzyme-linked immunosorbent assay (ELISA) kits. The objective is to find the most important molecular properties that affect the CR, and predict the CR by multiple regression methods. The results show that the physicochemical, electronic and topological properties of the PAH molecules have an integrated effect on the CR properties for the two ELISAs, among which molar solubility (Sm) and valence molecular connectivity index (3χv) are the most important factors. The obtained regression equations for RisC kit are all statistically significant (p < 0.005) and show satisfactory ability for predicting CR values, while equations for RaPID kit are all not significant (p > 0.05) and not suitable for predicting. It is probably because that the RisC immunoassay employs a monoclonal antibody, while the RaPID kit is based on polyclonal antibody. Considering the important effect of solubility on the CR values, cross-reaction potential (CRP) is calculated and used as a complement of CR for evaluation of cross-reactions in immunoassays. Only the compounds with both high CR and high CRP can cause intense cross-reactions in immunoassays. PMID:23012547
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-01
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-01-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc.). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration
NASA Astrophysics Data System (ADS)
Han, Hao; Zhang, Hao; Wei, Xinzhou; Moore, William; Liang, Zhengrong
2016-03-01
In this paper, we proposed a low-dose computed tomography (LdCT) image reconstruction method with the help of prior knowledge learning from previous high-quality or normal-dose CT (NdCT) scans. The well-established statistical penalized weighted least squares (PWLS) algorithm was adopted for image reconstruction, where the penalty term was formulated by a texture-based Gaussian Markov random field (gMRF) model. The NdCT scan was firstly segmented into different tissue types by a feature vector quantization (FVQ) approach. Then for each tissue type, a set of tissue-specific coefficients for the gMRF penalty was statistically learnt from the NdCT image via multiple-linear regression analysis. We also proposed a scheme to adaptively select the order of gMRF model for coefficients prediction. The tissue-specific gMRF patterns learnt from the NdCT image were finally used to form an adaptive MRF penalty for the PWLS reconstruction of LdCT image. The proposed texture-adaptive PWLS image reconstruction algorithm was shown to be more effective to preserve image textures than the conventional PWLS image reconstruction algorithm, and we further demonstrated the gain of high-order MRF modeling for texture-preserved LdCT PWLS image reconstruction.
Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F.; Vehik, Kendra; Huang, Shuai; Rewers, Marian; Barriga, Katherine; Baxter, Judith; Eisenbarth, George; Frank, Nicole; Gesualdo, Patricia; Hoffman, Michelle; Norris, Jill; Ide, Lisa; Robinson, Jessie; Waugh, Kathleen; She, Jin-Xiong; Schatz, Desmond; Hopkins, Diane; Steed, Leigh; Choate, Angela; Silvis, Katherine; Shankar, Meena; Huang, Yi-Hua; Yang, Ping; Wang, Hong-Jie; Leggett, Jessica; English, Kim; McIndoe, Richard; Dequesada, Angela; Haller, Michael; Anderson, Stephen W.; Ziegler, Anette G.; Boerschmann, Heike; Bonifacio, Ezio; Bunk, Melanie; Försch, Johannes; Henneberger, Lydia; Hummel, Michael; Hummel, Sandra; Joslowski, Gesa; Kersting, Mathilde; Knopff, Annette; Kocher, Nadja; Koletzko, Sibylle; Krause, Stephanie; Lauber, Claudia; Mollenhauer, Ulrike; Peplow, Claudia; Pflüger, Maren; Pöhlmann, Daniela; Ramminger, Claudia; Rash-Sur, Sargol; Roth, Roswith; Schenkel, Julia; Thümer, Leonore; Voit, Katja; Winkler, Christiane; Zwilling, Marina; Simell, Olli G.; Nanto-Salonen, Kirsti; Ilonen, Jorma; Knip, Mikael; Veijola, Riitta; Simell, Tuula; Hyöty, Heikki; Virtanen, Suvi M.; Kronberg-Kippilä, Carina; Torma, Maija; Simell, Barbara; Ruohonen, Eeva; Romo, Minna; Mantymaki, Elina; Schroderus, Heidi; Nyblom, Mia; Stenius, Aino; Lernmark, Åke; Agardh, Daniel; Almgren, Peter; Andersson, Eva; Andrén-Aronsson, Carin; Ask, Maria; Karlsson, Ulla-Marie; Cilio, Corrado; Bremer, Jenny; Ericson-Hallström, Emilie; Gard, Thomas; Gerardsson, Joanna; Gustavsson, Ulrika; Hansson, Gertie; Hansen, Monica; Hyberg, Susanne; Håkansson, Rasmus; Ivarsson, Sten; Johansen, Fredrik; Larsson, Helena; Lernmark, Barbro; Markan, Maria; Massadakis, Theodosia; Melin, Jessica; Månsson-Martinez, Maria; Nilsson, Anita; Nilsson, Emma; Rahmati, Kobra; Rang, Sara; Järvirova, Monica Sedig; Sibthorpe, Sara; Sjöberg, Birgitta; Törn, Carina; Wallin, Anne; Wimar, Åsa; Hagopian, William A.; Yan, Xiang; Killian, Michael; Crouch, Claire Cowen; Hay, Kristen M.; Ayres, Stephen; Adams, Carissa; Bratrude, Brandi; Fowler, Greer; Franco, Czarina; Hammar, Carla; Heaney, Diana; Marcus, Patrick; Meyer, Arlene; Mulenga, Denise; Scott, Elizabeth; Skidmore, Jennifer; Small, Erin; Stabbert, Joshua; Stepitova, Viktoria; Becker, Dorothy; Franciscus, Margaret; Dalmagro-Elias Smith, MaryEllen; Daftary, Ashi; Krischer, Jeffrey P.; Abbondondolo, Michael; Ballard, Lori; Brown, Rasheedah; Cuthbertson, David; Eberhard, Christopher; Gowda, Veena; Lee, Hye-Seung; Liu, Shu; Malloy, Jamie; McCarthy, Cristina; McLeod, Wendy; Smith, Laura; Smith, Stephen; Smith, Susan; Uusitalo, Ulla; Yang, Jimin; Akolkar, Beena; Briese, Thomas; Erlich, Henry; Oberste, Steve
2016-01-01
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions. PMID:27561809
Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai
2016-01-01
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions. PMID:27561809
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Partial covariate adjusted regression
Şentürk, Damla; Nguyen, Danh V.
2008-01-01
Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296
NASA Astrophysics Data System (ADS)
Huang, Cong; Liu, Dan-Dan; Wang, Jing-Song
2009-06-01
The 10.7 cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7 cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network.
Boy-Roura, M; Cameron, K C; Di, H J
2016-02-01
This study presents a meta-analysis of 12 experiments that quantify nitrate-N leaching losses from grazed pasture systems in alluvial sedimentary soils in Canterbury (New Zealand). Mean measured nitrate-N leached (kg N/ha × 100 mm drainage) losses were 2.7 when no urine was applied, 8.4 at the urine rate of 300 kg N/ha, 9.8 at 500 kg N/ha, 24.5 at 700 kg N/ha and 51.4 at 1000 kg N/ha. Lismore soils presented significantly higher nitrate-N losses compared to Templeton soils. Moreover, a multiple linear regression (MLR) model was developed to determine the key factors that influence nitrate-N leaching and to predict nitrate-N leaching losses. The MLR analyses was calibrated and validated using 82 average values of nitrate-N leached and 48 explanatory variables representative of nitrogen inputs and outputs, transport, attenuation of nitrogen and farm management practices. The MLR model (R (2) = 0.81) showed that nitrate-N leaching losses were greater at higher urine application rates and when there was more drainage from rainfall and irrigation. On the other hand, nitrate leaching decreased when nitrification inhibitors (e.g. dicyandiamide (DCD)) were applied. Predicted nitrate-N leaching losses at the paddock scale were calculated using the MLR equation, and they varied largely depending on the urine application rate and urine patch coverage. PMID:26498804
NASA Astrophysics Data System (ADS)
Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika
2015-12-01
Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.
Jović, Ozren; Smrečki, Neven; Popović, Zora
2016-04-01
A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for p<0.05). Also, iRR can be a fast alternative to iPLS, especially in case of unknown degree of complexity of analyzed system, i.e. if upper limit of number of latent variables is not easily estimated for iPLS. Adulteration of hempseed (H) oil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEP<1.2%). This means that FTIR-ATR coupled with iRR can very rapidly and effectively determine the level of adulteration in the adulterated hempseed oil (R(2)>0.99). PMID:26838379
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects. PMID:21350755
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false Penal provisions under the Taylor Grazing...-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2 of the Act any person who willfully commits an act prohibited under § 4140.1(b), or who...
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false Penal provisions under the Taylor Grazing...-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2 of the Act any person who willfully commits an act prohibited under § 4140.1(b), or who...
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false Penal provisions under the Taylor Grazing...-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2 of the Act any person who willfully commits an act prohibited under § 4140.1(b), or who...
27 CFR 19.957 - Instructions to compute bond penal sum.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Instructions to compute bond penal sum. 19.957 Section 19.957 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX... Fuel Use Bonds § 19.957 Instructions to compute bond penal sum. (a) Medium plants. To find the...
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Penal provisions under the Taylor Grazing Act. 4170.2-1 Section 4170.2-1 Public Lands: Interior Regulations Relating to Public Lands (Continued...-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2...
Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets.
Heinze, Georg; Puhr, Rainer
2010-03-30
Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at: http://www.muw.ac.at/msi/biometrie/programs. PMID:20213709
Liu, Song; Su, Bo-min; Li, Qing-hui; Gan, Fu-xi
2015-01-01
The authors tried to find a method for quantitative analysis using pXRF without solid bulk stone/jade reference samples. 24 nephrite samples were selected, 17 samples were calibration samples and the other 7 are test samples. All the nephrite samples were analyzed by Proton induced X-ray emission spectroscopy (PIXE) quantitatively. Based on the PIXE results of calibration samples, calibration curves were created for the interested components/elements and used to analyze the test samples quantitatively; then, the qualitative spectrum of all nephrite samples were obtained by pXRF. According to the PIXE results and qualitative spectrum of calibration samples, partial least square method (PLS) was used for quantitative analysis of test samples. Finally, the results of test samples obtained by calibration method, PLS method and PIXE were compared to each other. The accuracy of calibration curve method and PLS method was estimated. The result indicates that the PLS method is the alternate method for quantitative analysis of stone/jade samples. PMID:25993858
Middle Micoene sandstone reservoirs of the Penal/Barrackpore field
Dyer, B.L. )
1991-03-01
The Penal/Barrackpore field was discovered in 1938 and is located in the southern subbasin of onshore Trinidad. The accumulation is one of a series of northeast-southwest trending en echelon middle Miocene anticlinal structures that was later accentuated by late Pliocene transpressional folding. Relative movement of the South American and Caribbean plates climaxed in the middle Miocene compressive tectonic event and produced an imbricate pattern of southward-facing basement-involved thrusts. Further compressive interaction between the plates in the late Pliocene produced a transpressive tectonic episode forming northwest-southeast oriented transcurrent faults, tear faults, basement thrust faults, lystric normal faults, and detached simple folds with infrequent diapiric cores. The middle Miocene Herrera and Karamat turbiditic sandstones are the primary reservoir rock in the subsurface anticline of the Penal/Barrackpore field. These turbidites were sourced from the north and deposited within the marls and clays of the Cipero Formation. Miocene and Pliocene deltaics and turbidites succeed the Cipero Formation vertically, lapping into preexisting Miocene highs. The late Pliocene transpression also coincides with the onset of oil migration along faults, diapirs, and unconformities from the Cretaceous Naparima Hill source. The Lengua Formation and the upper Forest clays are considered effective seals. Hydrocarbon trapping is structurally and stratigraphically controlled, with structure being the dominant trapping mechanism. Ultimate recoverable reserves for the field are estimated at 127.9 MMBo and 628.8 bcf. The field is presently owned and operated by the Trinidad and Tobago Oil Company Limited (TRINTOC).
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Basis Selection for Wavelet Regression
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)
1998-01-01
A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Briggs, D J; de Hoogh, C; Gulliver, J; Wills, J; Elliott, P; Kingham, S; Smallbone, K
2000-05-15
Accurate, high-resolution maps of traffic-related air pollution are needed both as a basis for assessing exposures as part of epidemiological studies, and to inform urban air-quality policy and traffic management. This paper assesses the use of a GIS-based, regression mapping technique to model spatial patterns of traffic-related air pollution. The model--developed using data from 80 passive sampler sites in Huddersfield, as part of the SAVIAH (Small Area Variations in Air Quality and Health) project--uses data on traffic flows and land cover in the 300-m buffer zone around each site, and altitude of the site, as predictors of NO2 concentrations. It was tested here by application in four urban areas in the UK: Huddersfield (for the year following that used for initial model development), Sheffield, Northampton, and part of London. In each case, a GIS was built in ArcInfo, integrating relevant data on road traffic, urban land use and topography. Monitoring of NO2 was undertaken using replicate passive samplers (in London, data were obtained from surveys carried out as part of the London network). In Huddersfield, Sheffield and Northampton, the model was first calibrated by comparing modelled results with monitored NO2 concentrations at 10 randomly selected sites; the calibrated model was then validated against data from a further 10-28 sites. In London, where data for only 11 sites were available, validation was not undertaken. Results showed that the model performed well in all cases. After local calibration, the model gave estimates of mean annual NO2 concentrations within a factor of 1.5 of the actual mean (approx. 70-90%) of the time and within a factor of 2 between 70 and 100% of the time. r2 values between modelled and observed concentrations are in the range of 0.58-0.76. These results are comparable to those achieved by more sophisticated dispersion models. The model also has several advantages over dispersion modelling. It is able, for example, to provide
Pereira, L F P; Adeola, O
2016-09-01
The energy and phosphorus values of sunflower meal (SFM) and rice bran (RB) were determined in 2 experiments with Ross 708 broiler chickens from 15 to 22 d of age. In Exp.1, the diets consisted of a corn-soybean meal reference diet (RD) and 4 test diets (TD). The TD consisted of SFM and RB that partly replaced the energy sources in the RD at 100 or 200 g/kg and 75 or 150 g/kg, respectively, such that the equal ratios were maintained for all energy containing ingredients across all experimental diets. In Exp.2, a cornstarch-soybean meal diet was the RD and TD consisting of SFM and RB that partly replaced cornstarch in the RD at 100 or 200 g/kg and 60 or 120 g/kg, respectively. Addition of SFM and RB to the RD in Exp.1 linearly decreased (P < 0.01) the digestibility coefficients of DM, energy, ileal digestible energy (IDE), metabolizability coefficients of DM, nitrogen (N), energy, N correct energy, metabolize energy (ME), and nitrogen-corrected ME. Except for RB, the increased levels of the test ingredients in RD did affect the metabolizability coefficients of N. The IDE values (kcal/kg DM) were 1,953 for SFM and 2,498 for RB; ME values (kcal/kg DM) were 1,893 for SFM and 2,683 for RB; and MEn values (kcal/kg DM) were 1,614 for SFM and 2,476 for RB. In Exp.2, there was a linear relationship between phosphorus (P) intake and ileal P output for diets with increased levels of SFM and RB. In addition, there was a linear relationship between P intake and P digestibility and retention for diets with increased levels of SFM. There were a quadratic effect (P < 0.01) and a tendency of quadratic effect (P = 0.07) for P digestible and total tract P retained, respectively, in the RB diets. The P digestibility and total tract P retention from regression analyses for SFM were 46% and 38%, respectively. PMID:26976902
Fast Censored Linear Regression
HUANG, YIJIAN
2013-01-01
Weighted log-rank estimating function has become a standard estimation method for the censored linear regression model, or the accelerated failure time model. Well established statistically, the estimator defined as a consistent root has, however, rather poor computational properties because the estimating function is neither continuous nor, in general, monotone. We propose a computationally efficient estimator through an asymptotics-guided Newton algorithm, in which censored quantile regression methods are tailored to yield an initial consistent estimate and a consistent derivative estimate of the limiting estimating function. We also develop fast interval estimation with a new proposal for sandwich variance estimation. The proposed estimator is asymptotically equivalent to the consistent root estimator and barely distinguishable in samples of practical size. However, computation time is typically reduced by two to three orders of magnitude for point estimation alone. Illustrations with clinical applications are provided. PMID:24347802
Song, Dong; Wang, Haonan; Tu, Catherine Y.; Marmarelis, Vasilis Z.; Hampson, Robert E.; Deadwyler, Sam A.; Berger, Theodore W.
2013-01-01
One key problem in computational neuroscience and neural engineering is the identification and modeling of functional connectivity in the brain using spike train data. To reduce model complexity, alleviate overfitting, and thus facilitate model interpretation, sparse representation and estimation of functional connectivity is needed. Sparsities include global sparsity, which captures the sparse connectivities between neurons, and local sparsity, which reflects the active temporal ranges of the input-output dynamical interactions. In this paper, we formulate a generalized functional additive model (GFAM) and develop the associated penalized likelihood estimation methods for such a modeling problem. A GFAM consists of a set of basis functions convolving the input signals, and a link function generating the firing probability of the output neuron from the summation of the convolutions weighted by the sought model coefficients. Model sparsities are achieved by using various penalized likelihood estimations and basis functions. Specifically, we introduce two variations of the GFAM using a global basis (e.g., Laguerre basis) and group LASSO estimation, and a local basis (e.g., B-spline basis) and group bridge estimation, respectively. We further develop an optimization method based on quadratic approximation of the likelihood function for the estimation of these models. Simulation and experimental results show that both group-LASSO-Laguerre and group-bridge-B-spline can capture faithfully the global sparsities, while the latter can replicate accurately and simultaneously both global and local sparsities. The sparse models outperform the full models estimated with the standard maximum likelihood method in out-of-sample predictions. PMID:23674048
Splines for Diffeomorphic Image Regression
Singh, Nikhil; Niethammer, Marc
2016-01-01
This paper develops a method for splines on diffeomorphisms for image regression. In contrast to previously proposed methods to capture image changes over time, such as geodesic regression, the method can capture more complex spatio-temporal deformations. In particular, it is a first step towards capturing periodic motions for example of the heart or the lung. Starting from a variational formulation of splines the proposed approach allows for the use of temporal control points to control spline behavior. This necessitates the development of a shooting formulation for splines. Experimental results are shown for synthetic and real data. The performance of the method is compared to geodesic regression. PMID:25485370
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Quantile regression for climate data
NASA Astrophysics Data System (ADS)
Marasinghe, Dilhani Shalika
Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record PMID:26651981
Liu, Jin-Xing; Liu, Jian; Gao, Ying-Lian; Mi, Jian-Xun; Ma, Chun-Xia; Wang, Dong
2014-01-01
In terms of making genes expression data more interpretable and comprehensible, there exists a significant superiority on sparse methods. Many sparse methods, such as penalized matrix decomposition (PMD) and sparse principal component analysis (SPCA), have been applied to extract plants core genes. Supervised algorithms, especially the support vector machine-recursive feature elimination (SVM-RFE) method, always have good performance in gene selection. In this paper, we draw into class information via the total scatter matrix and put forward a class-information-based penalized matrix decomposition (CIPMD) method to improve the gene identification performance of PMD-based method. Firstly, the total scatter matrix is obtained based on different samples of the gene expression data. Secondly, a new data matrix is constructed by decomposing the total scatter matrix. Thirdly, the new data matrix is decomposed by PMD to obtain the sparse eigensamples. Finally, the core genes are identified according to the nonzero entries in eigensamples. The results on simulation data show that CIPMD method can reach higher identification accuracies than the conventional gene identification methods. Moreover, the results on real gene expression data demonstrate that CIPMD method can identify more core genes closely related to the abiotic stresses than the other methods. PMID:25180509
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Technology Transfer Automated Retrieval System (TEKTRAN)
In fiber length measurement by the rapid method of testing fiber beards instead of testing individual fibers, only the fiber portion projected from the fiber clamp can be measured. The length distribution of the projecting portion is very different from that of the original sample. The Part 1 pape...
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-12-01
SummaryIn this study, groundwater spring potential maps produced by three different methods, frequency ratio, weights of evidence, and logistic regression, were evaluated using validation data sets and compared to each other. Groundwater spring occurrence potential maps in the Sultan Mountains (Konya, Turkey) were constructed using the relationship between groundwater spring locations and their causative factors. Groundwater spring locations were identified in the study area from a topographic map. Different thematic maps of the study area, such as geology, topography, geomorphology, hydrology, and land use/cover, have been used to identify groundwater potential zones. Seventeen spring-related parameter layers of the entire study area were used to generate groundwater spring potential maps. These are geology (lithology), fault density, distance to fault, relative permeability of lithologies, elevation, slope aspect, slope steepness, curvature, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport capacity index, drainage density, distance to drainage, land use/cover, and precipitation. The predictive capability of each model was determined by the area under the relative operating characteristic curve. The areas under the curve for frequency ratio, weights of evidence and logistic regression methods were calculated as 0.903, 0.880, and 0.840, respectively. These results indicate that frequency ratio and weights of evidence models are relatively good estimators, whereas logistic regression is a relatively poor estimator of groundwater spring potential mapping in the study area. The frequency ratio model is simple; the process of input, calculation and output can be readily understood. The produced groundwater spring potential maps can serve planners and engineers in groundwater development plans and land-use planning.
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Eriksson, Lennart; Jaworska, Joanna; Worth, Andrew P; Cronin, Mark T D; McDowell, Robert M; Gramatica, Paola
2003-01-01
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation. PMID:12896860
Lee, Myung Hee; Liu, Yufeng
2013-12-01
The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224
A penalized likelihood approach for robust estimation of isoform expression
2016-01-01
Ultra high-throughput sequencing of transcriptomes (RNA-Seq) has enabled the accurate estimation of gene expression at individual isoform level. However, systematic biases introduced during the sequencing and mapping processes as well as incompleteness of the transcript annotation databases may cause the estimates of isoform abundances to be unreliable, and in some cases, highly inaccurate. This paper introduces a penalized likelihood approach to detect and correct for such biases in a robust manner. Our model extends those previously proposed by introducing bias parameters for reads. An L1 penalty is used for the selection of non-zero bias parameters. We introduce an efficient algorithm for model fitting and analyze the statistical properties of the proposed model. Our experimental studies on both simulated and real datasets suggest that the model has the potential to improve isoform-specific gene expression estimates and identify incompletely annotated gene models.
The Penal Code (Amendment) Act 1989 (Act A727), 1989.
1989-01-01
In 1989, Malaysia amended its penal code to provide that inducing an abortion is not an offense if the procedure is performed by a registered medical practitioner who has determined that continuation of the pregnancy would risk the life of the woman or damage her mental or physical health. Additional amendments include a legal description of the conditions which constitute the act of rape. Among these conditions is intercourse with or without consent with a woman under the age of 16. Malaysia fails to recognize rape within a marriage unless the woman is protected from her husband by judicial decree or is living separately from her husband according to Muslim custom. Rape is punishable by imprisonment for a term of 5-20 years and by whipping. PMID:12344384
A Penalized Latent Class Model for Ordinal Data
Houseman, E. Andrés; Coull, Brent A.; Stemmer-Rachamimov, Anat; Betensky, Rebecca A.
2016-01-01
Latent class models provide a useful framework for clustering observations based on several features. Application of latent class methodology to correlated, high-dimensional ordinal data poses many challenges. Unconstrained analyses may not result in an estimable model. Thus, information contained in ordinal variables may not be fully exploited by researchers. We develop a penalized latent class model to facilitate analysis of high-dimensional ordinal data. By stabilizing maximum likelihood estimation, we are able to fit an ordinal latent class model that would otherwise not be identifiable without application of strict constraints. We illustrate our methodology in a study of schwannoma, a peripheral nerve sheath tumor, that included three clinical subtypes and 23 ordinal histological measures. PMID:17626225
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan; Altural, Tolga
2013-03-01
This study evaluated and compared landslide susceptibility maps produced with three different methods, frequency ratio, weights of evidence, and logistic regression, by using validation datasets. The field surveys performed as part of this investigation mapped the locations of 90 landslides that had been identified in the Sultan Mountains of south-western Turkey. The landslide influence parameters used for this study are geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transportation capacity index, distance to drainage, distance to fault, drainage density, fault density, and spring density maps. The relationships between landslide distributions and these parameters were analysed using the three methods, and the results of these methods were then used to calculate the landslide susceptibility of the entire study area. The accuracy of the final landslide susceptibility maps was evaluated based on the landslides observed during the fieldwork, and the accuracy of the models was evaluated by calculating each model's relative operating characteristic curve. The predictive capability of each model was determined from the area under the relative operating characteristic curve and the areas under the curves obtained using the frequency ratio, logistic regression, and weights of evidence methods are 0.976, 0.952, and 0.937, respectively. These results indicate that the frequency ratio and weights of evidence models are relatively good estimators of landslide susceptibility in the study area. Specifically, the results of the correlation analysis show a high correlation between the frequency ratio and weights of evidence results, and the frequency ratio and logistic regression methods exhibit correlation coefficients of 0.771 and 0.727, respectively. The frequency ratio model is simple, and its input, calculation and output processes are
Pathway-gene identification for pancreatic cancer survival via doubly regularized Cox regression
2014-01-01
Background Recent global genomic analyses identified 69 gene sets and 12 core signaling pathways genetically altered in pancreatic cancer, which is a highly malignant disease. A comprehensive understanding of the genetic signatures and signaling pathways that are directly correlated to pancreatic cancer survival will help cancer researchers to develop effective multi-gene targeted, personalized therapies for the pancreatic cancer patients at different stages. A previous work that applied a LASSO penalized regression method, which only considered individual genetic effects, identified 12 genes associated with pancreatic cancer survival. Results In this work, we integrate pathway information into pancreatic cancer survival analysis. We introduce and apply a doubly regularized Cox regression model to identify both genes and signaling pathways related to pancreatic cancer survival. Conclusions Four signaling pathways, including Ion transport, immune phagocytosis, TGFβ (spermatogenesis), regulation of DNA-dependent transcription pathways, and 15 genes within the four pathways are identified and verified to be directly correlated to pancreatic cancer survival. Our findings can help cancer researchers design new strategies for the early detection and diagnosis of pancreatic cancer. PMID:24565114
Naguib, Ibrahim A; Abdelaleem, Eglal A; Zaazaa, Hala E; Hussein, Essraa A
2016-07-01
Two multivariate chemometric models, namely, partial least-squares regression (PLSR) and linear support vector regression (SVR), are presented for the analysis of amoxicillin trihydrate and dicloxacillin sodium in the presence of their common impurity (6-aminopenicillanic acid) in raw materials and in pharmaceutical dosage form via handling UV spectral data and making a modest comparison between the two models, highlighting the advantages and limitations of each. For optimum analysis, a three-factor, four-level experimental design was established, resulting in a training set of 16 mixtures containing different ratios of interfering species. To validate the prediction ability of the suggested models, an independent test set consisting of eight mixtures was used. The presented results show the ability of the two proposed models to determine the two drugs simultaneously in the presence of small levels of the common impurity with high accuracy and selectivity. The analysis results of the dosage form were statistically compared to a reported HPLC method, with no significant difference regarding accuracy and precision, indicating the ability of the suggested multivariate calibration models to be reliable and suitable for routine analysis of the drug product. Compared to the PLSR model, the SVR model gives more accurate results with a lower prediction error, as well as high generalization ability; however, the PLSR model is easy to handle and fast to optimize. PMID:27305461
Building Regression Models: The Importance of Graphics.
ERIC Educational Resources Information Center
Dunn, Richard
1989-01-01
Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Li, Yusheng
2011-02-21
Iterative reconstruction algorithms have been widely used in PET and SPECT emission tomography. Accurate modeling of photon noise propagation is crucial for quantitative tomography applications. Iteration-based noise propagation methods have been developed for only a few algorithms that have explicit multiplicative update equations. And there are discrepancies between the iteration-based methods and Fessler's fixed-point method because of improper approximations. In this paper, we present a unified theoretical prediction of noise propagation for any penalized expectation maximization (EM) algorithm where the EM approach incorporates a penalty term. The proposed method does not require an explicit update equation. The update equation is assumed to be implicitly defined by a differential equation of a surrogate function. We derive the expressions using the implicit function theorem, Taylor series and the chain rule from vector calculus. We also derive the fixed-point expressions when iterative algorithms converge and show the consistency between the proposed method and the fixed-point method. These expressions are solely defined in terms of the partial derivatives of the surrogate function and the Fisher information matrices. We also apply the theoretical noise predictions for iterative reconstruction algorithms in emission tomography. Finally, we validate the theoretical predictions for MAP-EM and OSEM algorithms using Monte Carlo simulations with Jaszczak-like and XCAT phantoms, respectively. PMID:21263172
Seyedmahmoud, Rasoul; Mozetic, Pamela; Rainer, Alberto; Giannitelli, Sara Maria; Basoli, Francesco; Trombetta, Marcella; Traversa, Enrico; Licoccia, Silvia; Rinaldi, Antonio
2015-01-01
This two-articles series presents an in-depth discussion of electrospun poly-L-lactide scaffolds for tissue engineering by means of statistical methodologies that can be used, in general, to gain a quantitative and systematic insight about effects and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y₁₋₄), this Part-2 article continues and refocuses the discussion on the interdependence of scaffold properties investigated by standard regression methods. The discussion first explores the connection between mechanical properties (Y₄) and morphological descriptors of the scaffolds (Y₁₋₃) in 32 types of scaffolds, finding that the mean fiber diameter (Y₁) plays a predominant role which is nonetheless and crucially modulated by the molecular weight (MW) of PLLA. The second part examines the biological performance (Y₅) (i.e. the cell proliferation of seeded bone marrow-derived mesenchymal stromal cells) on a random subset of eight scaffolds vs. the mechanomorphological properties (Y₁₋₄). In this case, the featured regression analysis on such an incomplete set was not conclusive, though, indirectly suggesting in quantitative terms that cell proliferation could not fully be explained as a function of considered mechanomorphological properties (Y₁₋₄), but in the early stage seeding, and that a randomization effects occurs over time such that the differences in initial cell proliferation performance (at day 1) is smeared over time. The findings may be the cornerstone of a novel route to accrue sufficient understanding and establish design rules for scaffold biofunctional vs. architecture, mechanical properties, and process parameters. PMID:24668730
Network-guided sparse regression modeling for detection of gene-by-gene interactions
Lu, Chen; Latourelle, Jeanne; O’Connor, George T.; Dupuis, Josée; Kolaczyk, Eric D.
2013-01-01
Motivation: Genetic variants identified by genome-wide association studies to date explain only a small fraction of total heritability. Gene-by-gene interaction is one important potential source of unexplained total heritability. We propose a novel approach to detect such interactions that uses penalized regression and sparse estimation principles, and incorporates outside biological knowledge through a network-based penalty. Results: We tested our new method on simulated and real data. Simulation showed that with reasonable outside biological knowledge, our method performs noticeably better than stage-wise strategies (i.e. selecting main effects first, and interactions second, from those main effects selected) in finding true interactions, especially when the marginal strength of main effects is weak. We applied our method to Framingham Heart Study data on total plasma immunoglobulin E (IgE) concentrations and found a number of interactions among different classes of human leukocyte antigen genes that may interact to influence the risk of developing IgE dysregulation and allergy. Availability: The proposed method is implemented in R and available at http://math.bu.edu/people/kolaczyk/software.html. Contact: chenlu@bu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23599501
Computational Techniques for Spatial Logistic Regression with Large Datasets
Paciorek, Christopher J.
2007-01-01
In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. I focus on binary outcomes, with the risk surface a smooth function of space, but the development herein is relevant for non-normal data in general. I compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface via the Fourier basis provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being reasonably computationally efficient. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. A Bayesian Markov random field model performs less well statistically than the spectral basis model, but is very computationally efficient. We illustrate the methods on a real dataset of cancer cases in Taiwan. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models. PMID:18443645
Kong, Changsu; Adeola, Olayiwola
2016-01-01
The present study was conducted to determine ileal digestible energy (IDE), metabolizable energy (ME), and nitrogen-corrected ME (MEn) contents of expeller- (EECM) and solvent-extracted canola meal (SECM) for broiler chickens using the regression method. Dietary treatments consisted of a corn-soybean meal reference diet and four assay diets prepared by supplementing the reference diet with each of canola meals (EECM or SECM) at 100 or 200 g/kg, respectively, to partly replace the energy yielding sources in the reference diet. Birds received a standard starter diet from day 0 to 14 and the assay diets from day 14 to 21. On day 14, a total of 240 birds were grouped into eight blocks by body weight and randomly allocated to five dietary treatments in each block with six birds per cage in a randomized complete block design. Excreta samples were collected from day 18 to 20 and ileal digesta were collected on day 21. The IDE, ME, and MEn (kcal/kg DM) of EECM or SECM were derived from the regression of EECM- or SECM-associated IDE, ME and MEn intake (Y, kcal) against the intake of EECM or SECM (X, kg DM), respectively. Regression equations of IDE, ME and MEn for the EECM-substituted diet were Y = -21.2 + 3035X (r(2) = 0.946), Y = -1.0 + 2807X (r(2) = 0.884) and Y = -2.0 + 2679X (r(2) = 0.902), respectively. The respective equations for the SECM diet were Y = 20.7 + 2881X (r(2) = 0.962), Y = 27.2 + 2077X (r(2) = 0.875) and Y = 24.7 + 2013X (r(2) = 0.901). The slope for IDE did not differ between the EECM and SECM whereas the slopes for ME and MEn were greater (P < 0.05) for the EECM than for the SECM. These results indicate that the EECM might be a superior energy source for broiler chickens compared with the SECM when both canola meals are used to reduce the cost of feeding. PMID:27350926
NASA Astrophysics Data System (ADS)
Yuan, Haibo; Liu, Xiaowei; Xiang, Maosheng; Huang, Yang; Zhang, Huihua; Chen, Bingqiu
2015-02-01
In this paper we propose a spectroscopy-based stellar color regression (SCR) method to perform accurate color calibration for modern imaging surveys, taking advantage of millions of stellar spectra now available. The method is straightforward, insensitive to systematic errors in the spectroscopically determined stellar atmospheric parameters, applicable to regions that are effectively covered by spectroscopic surveys, and capable of delivering an accuracy of a few millimagnitudes for color calibration. As an illustration, we have applied the method to the Sloan Digital Sky Survey (SDSS) Stripe 82 data. With a total number of 23,759 spectroscopically targeted stars, we have mapped out the small but strongly correlated color zero-point errors present in the photometric catalog of Stripe 82, and we improve the color calibration by a factor of two to three. Our study also reveals some small but significant magnitude dependence errors in the z band for some charge-coupled devices (CCDs). Such errors are likely to be present in all the SDSS photometric data. Our results are compared with those from a completely independent test based on the intrinsic colors of red galaxies presented by Ivezić et al. The comparison, as well as other tests, shows that the SCR method has achieved a color calibration internally consistent at a level of about 5 mmag in u - g, 3 mmag in g - r, and 2 mmag in r - i and i - z. Given the power of the SCR method, we discuss briefly the potential benefits by applying the method to existing, ongoing, and upcoming imaging surveys.
Yuan, Haibo; Liu, Xiaowei; Xiang, Maosheng; Huang, Yang; Zhang, Huihua; Chen, Bingqiu E-mail: x.liu@pku.edu.cn
2015-02-01
In this paper we propose a spectroscopy-based stellar color regression (SCR) method to perform accurate color calibration for modern imaging surveys, taking advantage of millions of stellar spectra now available. The method is straightforward, insensitive to systematic errors in the spectroscopically determined stellar atmospheric parameters, applicable to regions that are effectively covered by spectroscopic surveys, and capable of delivering an accuracy of a few millimagnitudes for color calibration. As an illustration, we have applied the method to the Sloan Digital Sky Survey (SDSS) Stripe 82 data. With a total number of 23,759 spectroscopically targeted stars, we have mapped out the small but strongly correlated color zero-point errors present in the photometric catalog of Stripe 82, and we improve the color calibration by a factor of two to three. Our study also reveals some small but significant magnitude dependence errors in the z band for some charge-coupled devices (CCDs). Such errors are likely to be present in all the SDSS photometric data. Our results are compared with those from a completely independent test based on the intrinsic colors of red galaxies presented by Ivezić et al. The comparison, as well as other tests, shows that the SCR method has achieved a color calibration internally consistent at a level of about 5 mmag in u – g, 3 mmag in g – r, and 2 mmag in r – i and i – z. Given the power of the SCR method, we discuss briefly the potential benefits by applying the method to existing, ongoing, and upcoming imaging surveys.
Wang, Zhu; Ma, Shuangge; Wang, Ching-Yun; Zappitelli, Michael; Devarajan, Prasad; Parikh, Chirag
2014-01-01
This paper proposes a new statistical approach for predicting postoperative morbidity such as intensive care unit length of stay and number of complications after cardiac surgery in children. In a recent multi-center study sponsored by the National Institutes of Health, 311 children undergoing cardiac surgery were enrolled. Morbidity data are count data in which the observations take only non-negative integer values. Often the number of zeros in the sample cannot be accommodated properly by a simple model, thus requiring a more complex model such as the zero-inflated Poisson (ZIP) regression model. We’re interested in identifying important risk factors for postoperative morbidity among many candidate predictors. There is only limited methodological work on variable selection for the zero-inflated regression models. In this paper, we consider regularized ZIP models through penalized likelihood function, and develop a new expectation-maximization (EM) algorithm for numerical optimization. Simulation studies show that the proposed method has better performance than some competing methods. Using the proposed methods, we analyzed the postoperative morbidity, which improved the model fitting and identified important clinical and biomarker risk factors. PMID:25256715
Eberly, Lynn E
2007-01-01
This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050
Efficient Drug-Pathway Association Analysis via Integrative Penalized Matrix Decomposition.
Li, Cong; Yang, Can; Hather, Greg; Liu, Ray; Zhao, Hongyu
2016-01-01
Traditional drug discovery practice usually follows the "one drug - one target" approach, seeking to identify drug molecules that act on individual targets, which ignores the systemic nature of human diseases. Pathway-based drug discovery recently emerged as an appealing approach to overcome this limitation. An important first step of such pathway-based drug discovery is to identify associations between drug molecules and biological pathways. This task has been made feasible by the accumulating data from high-throughput transcription and drug sensitivity profiling. In this paper, we developed "iPaD", an integrative Penalized Matrix Decomposition method to identify drug-pathway associations through jointly modeling of such high-throughput transcription and drug sensitivity data. A scalable bi-convex optimization algorithm was implemented and gave iPaD tremendous advantage in computational efficiency over current state-of-the-art method, which allows it to handle the ever-growing large-scale data sets that current method cannot afford to. On two widely used real data sets, iPaD also significantly outperformed the current method in terms of the number of validated drug-pathway associations that were identified. The Matlab code of our algorithm publicly available at http://licong-jason.github.io/iPaD/. PMID:27295636
Nonparametric spectral analysis of heart rate variability through penalized sum of squares.
Krafty, Robert T; Zhao, Mengyuan; Buysse, Daniel J; Thayer, Julian F; Hall, Martica
2014-04-15
Researchers in a variety of biomedical fields have utilized frequency domain properties of heart rate variability (HRV), or the elapsed time between consecutive heart beats. HRV is measured from the electrocardiograph signal through the interbeat interval series. Popular approaches for estimating power spectra from these interval data apply common spectral analysis methods that are designed for the analysis of evenly sampled time series. The application of these methods to the interbeat interval series, which is indexed over an uneven time grid, requires a bias-inducing transformation. The goal of this article is to explore the use of penalized sum of squares for nonparametric estimation of the spectrum of HRV directly from the interbeat intervals. A novel cross-validation procedure is introduced for smoothing parameter selection. Empirical properties of the proposed estimation procedure are explored and compared with popular methods in a simulation study. The proposed method is used in an analysis of data from an insomnia study, which seeks to illuminate the association between the power spectrum of HRV during different periods of sleep with response to behavioral therapy. PMID:24254401
Llacer, J; Solberg, T D; Promberger, C
2001-10-01
This paper presents a description of tests carried out to compare the behaviour of five algorithms in inverse radiation therapy planning: (1) The Dynamically Penalized Likelihood (DPL), an algorithm based on statistical estimation theory; (2) an accelerated version of the same algorithm: (3) a new fast adaptive simulated annealing (ASA) algorithm; (4) a conjugate gradient method; and (5) a Newton gradient method. A three-dimensional mathematical phantom and two clinical cases have been studied in detail. The phantom consisted of a U-shaped tumour with a partially enclosed 'spinal cord'. The clinical examples were a cavernous sinus meningioma and a prostate case. The algorithms have been tested in carefully selected and controlled conditions so as to ensure fairness in the assessment of results. It has been found that all five methods can yield relatively similar optimizations, except when a very demanding optimization is carried out. For the easier cases. the differences are principally in robustness, ease of use and optimization speed. In the more demanding case, there are significant differences in the resulting dose distributions. The accelerated DPL emerges as possibly the algorithm of choice for clinical practice. An appendix describes the differences in behaviour between the new ASA method and the one based on a patent by the Nomos Corporation. PMID:11686280
Energy Science and Technology Software Center (ESTSC)
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Orthogonal Regression and Equivariance.
ERIC Educational Resources Information Center
Blankmeyer, Eric
Ordinary least-squares regression treats the variables asymmetrically, designating a dependent variable and one or more independent variables. When it is not obvious how to make this distinction, a researcher may prefer to use orthogonal regression, which treats the variables symmetrically. However, the usual procedure for orthogonal regression is…
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Penalizing Hospitals for Chronic Obstructive Pulmonary Disease Readmissions
Au, David H.
2014-01-01
In October 2014, the U.S. Centers for Medicare and Medicaid Services (CMS) will expand its Hospital Readmission Reduction Program (HRRP) to include chronic obstructive pulmonary disease (COPD). Under the new policy, hospitals with high risk-adjusted, 30-day all-cause unplanned readmission rates after an index hospitalization for a COPD exacerbation will be penalized with reduced reimbursement for the treatment of Medicare beneficiaries. In this perspective, we review the history of the HRRP, including the recent addition of COPD to the policy. We critically assess the use of 30-day all-cause COPD readmissions as an accountability measure, discussing potential benefits and then highlighting the substantial drawbacks and potential unintended consequences of the measure that could adversely affect providers, hospitals, and patients with COPD. We conclude by emphasizing the need to place the 30-day COPD readmission measure in the context of a reconceived model for postdischarge quality and review several frameworks that could help guide this process. PMID:24460431
Penal Code (Ordinance No. 12 of 1983), 1 July 1984.
1987-01-01
This document contains provisions of the 1984 Penal Code of Montserrat relating to sexual offenses, abortion, offenses relating to marriage, homicide and other offenses against the person, and neglect endangering life or health. Part 8 of the Code holds that a man found guilty of raping a woman is liable to life imprisonment. Rape is deemed to involve unlawful (extramarital) sexual intercourse with a woman without her consent (this is determined if the rape involved force, threats, administration of drugs, or false representation). The Code also defines offenses in cases of incest, child abuse, prostitution, abduction, controlling the actions and finances of a prostitute, and having unlawful sexual intercourse with a mentally defective woman. Part 9 of the Code outlaws abortion unless it is conducted in an approved establishment after two medical practitioners have determined that continuing the pregnancy would risk the life or physical/mental health of the pregnant woman or if a substantial risk exists that the child would have serious abnormalities. Part 10 outlaws bigamy, and part 12 holds that infanticide performed by a mother suffering postpartum imbalances can be prosecuted as manslaughter. This part also outlaws concealment of the body of a newborn, whether that child died before, at, or after birth, and aggravated assault on any child not more than 14 years old. Part 12 makes it an offense to subject any child to neglect endangering its life or health. PMID:12346690
Deng, Zhi-Yin; Zhang, Bing; Dong, Wei; Wang, Xiao-Ping
2013-11-01
For the purpose of the rapid prediction of saturated fatty acid, oleic acid, linoleic acid content in edible vegetable oil, the Raman spectra of a batch of edible vegetable oils and their one-one mixtures with different ratios were measured in the range of 800-2 000 cm(-1), 91 samples were measured totally in this research, the obtained Raman spectra data were preprocessed by a new method proposed in this paper called auto-set fulcrums baseline fitting method based on peak-seeking algorithm, and 8 characteristic peak values (872 cm(-1) [v(C-C)], 972 cm(-1) [delta(C=C) trans], 1 082 cm(-1) [v(C-C)], 1 267 cm(-1) [delta(=C-H) cis], 1 303 cm(-1) [delta(CH2) twisting], 1 442 cm(-1) [delta(CH2) scissoring], 1 658 cm(-1) [v(C=C) cis], 1 748 cm(-1) [v(C=O)]) were extracted to be the eigenvalues for the whole spectra, among the 8 peaks there are three peaks (972, 1 267, 1 658 cm(-1)) that play an important role in the establishment of mathematical model, they are closely concerned with C=C band which distinguishes the three fatty acid types. By using these eigenvalues as inputs, and actual saturated fatty acid, oleic acid, linoleic acid contents of sample oils as outputs, a prediction mathematical model that predicts simultaneously the three fatty acid contents was established using multiple regression analysis: multi-output least squares support vector regression machine (MLS-SVR) and partial least squares (PLS). Results show that the MLS-SVR has better effects. The predicting results are compared with results of gas chromatography(GC), and the obtained root mean square error of prediction(RMSEP) for saturated fatty acid, oleic acid, linoleic acid are 0.496 7%, 0.840 0% and 1.019 9%, and the correlation coefficients (r) are 0.813 3, 0.999 2 and 0.998 1, respectively. When this model is applied in the detection of new unknown oil samples, the prediction error does not exceed 5%. Results show that the Raman spectra analysis technology based on MLS-SVR can be a convenient
Tang Shaojie; Tang Xiangyang
2012-09-15
Purposes: The suppression of noise in x-ray computed tomography (CT) imaging is of clinical relevance for diagnostic image quality and the potential for radiation dose saving. Toward this purpose, statistical noise reduction methods in either the image or projection domain have been proposed, which employ a multiscale decomposition to enhance the performance of noise suppression while maintaining image sharpness. Recognizing the advantages of noise suppression in the projection domain, the authors propose a projection domain multiscale penalized weighted least squares (PWLS) method, in which the angular sampling rate is explicitly taken into consideration to account for the possible variation of interview sampling rate in advanced clinical or preclinical applications. Methods: The projection domain multiscale PWLS method is derived by converting an isotropic diffusion partial differential equation in the image domain into the projection domain, wherein a multiscale decomposition is carried out. With adoption of the Markov random field or soft thresholding objective function, the projection domain multiscale PWLS method deals with noise at each scale. To compensate for the degradation in image sharpness caused by the projection domain multiscale PWLS method, an edge enhancement is carried out following the noise reduction. The performance of the proposed method is experimentally evaluated and verified using the projection data simulated by computer and acquired by a CT scanner. Results: The preliminary results show that the proposed projection domain multiscale PWLS method outperforms the projection domain single-scale PWLS method and the image domain multiscale anisotropic diffusion method in noise reduction. In addition, the proposed method can preserve image sharpness very well while the occurrence of 'salt-and-pepper' noise and mosaic artifacts can be avoided. Conclusions: Since the interview sampling rate is taken into account in the projection domain
Larson, Nicholas B; McDonnell, Shannon; Albright, Lisa Cannon; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham; MacInnis, Robert; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catolona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2016-09-01
Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single-marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden-type approaches attempt to identify aggregation of RVs across case-control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large-scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway-level RV analysis results from a prostate cancer (PC) risk case-control sequencing study. Finally, we discuss potential extensions and future directions of this work. PMID:27312771
Dubey, S. K.; Duddelly, S.; Jangala, H.; Saha, R. N.
2013-01-01
A reliable, rapid and sensitive isocratic reverse phase high-performance liquid chromatography method has been developed and validated for assay of ketorolac tromethamine in tablets and ophthalmic dosage forms using diclofenac sodium as an internal standard. An isocratic separation of ketorolac tromethamine was achieved on Oyster BDS (150×4.6 mm i.d., 5 μm particle size) column using mobile phase of methanol:acetonitrile:sodium dihydrogen phosphate (20 mM; pH 5.5) (50:10:40, %v/v) at a flow rate of 1.0 ml/min. The eluents were monitored at 322 nm for ketorolac and at 282 nm for diclofenac sodium with a photodiode array detector. The retention times of ketorolac and diclofenac sodium were found to be 1.9 min and 4.6 min, respectively. Response was a linear function of drug concentration in the range of 0.01-15 μg/ml (R2=0.994; linear regression model using weighing factor 1/x2) with a limit of detection and quantification of 0.002 μg/ml and 0.007 μg/ml, respectively. The % recovery and % relative standard deviation values indicated the method was accurate and precise. PMID:23901166
Time-Warped Geodesic Regression
Hong, Yi; Singh, Nikhil; Kwitt, Roland; Niethammer, Marc
2016-01-01
We consider geodesic regression with parametric time-warps. This allows, for example, to capture saturation effects as typically observed during brain development or degeneration. While highly-flexible models to analyze time-varying image and shape data based on generalizations of splines and polynomials have been proposed recently, they come at the cost of substantially more complex inference. Our focus in this paper is therefore to keep the model and its inference as simple as possible while allowing to capture expected biological variation. We demonstrate that by augmenting geodesic regression with parametric time-warp functions, we can achieve comparable flexibility to more complex models while retaining model simplicity. In addition, the time-warp parameters provide useful information of underlying anatomical changes as demonstrated for the analysis of corpora callosa and rat calvariae. We exemplify our strategy for shape regression on the Grassmann manifold, but note that the method is generally applicable for time-warped geodesic regression. PMID:25485368
Analysis of sparse data in logistic regression in medical research: A newer approach
Devika, S; Jeyaseelan, L; Sebastian, G
2016-01-01
Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs) with very wide 95% confidence interval (CI) (OR: >999.999, 95% CI: <0.001, >999.999). In this paper, we addressed this issue by using penalized logistic regression (PLR) method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13%) of the cases and in four (8.0%) of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0%) were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: <0.001, >999.999) whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48) using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86) times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41) using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in
Sampling and handling artifacts can bias filter-based measurements of particulate organic carbon (OC). Several measurement-based methods for OC artifact reduction and/or estimation are currently used in research-grade field studies. OC frequently is not artifact-corrected in larg...
ERIC Educational Resources Information Center
Wood, Timothy J.; Humphrey-Murto, Susan M.; Norman, Geoffrey R.
2006-01-01
When setting standards, administrators of small-scale OSCEs often face several challenges, including a lack of resources, a lack of available expertise in statistics, and difficulty in recruiting judges. The Modified Borderline-Group Method is a standard setting procedure that compensates for these challenges by using physician examiners and is…
Regression Analysis by Example. 5th Edition
ERIC Educational Resources Information Center
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Multinomial logistic regression ensembles.
Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J
2013-05-01
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model. PMID:23611203
NASA Astrophysics Data System (ADS)
Esposito, Carlo; Barra, Anna; Evans, Stephen G.; Scarascia Mugnozza, Gabriele; Delaney, Keith
2014-05-01
The study of landslide susceptibility by multivariate statistical methods is based on finding a quantitative relationship between controlling factors and landslide occurrence. Such studies have become popular in the last few decades thanks to the development of geographic information systems (GIS) software and the related improved data management. In this work we applied a statistical approach to an area of high landslide susceptibility mainly due to its tropical climate and geological-geomorphological setting. The study area is located in the south-east region of Brazil that has frequently been affected by flood and landslide hazard, especially because of heavy rainfall events during the summer season. In this work we studied a disastrous event that occurred on January 11th and 12th of 2011, which involved Região Serrana (the mountainous region of Rio de Janeiro State) and caused more than 5000 landslides and at least 904 deaths. In order to produce susceptibility maps, we focused our attention on an area of 93,6 km2 that includes Nova Friburgo city. We utilized two different multivariate statistic methods: Logistic Regression (LR), already widely used in applied geosciences, and Random Forest (RF), which has only recently been applied to landslide susceptibility analysis. With reference to each mapping unit, the first method (LR) results in a probability of landslide occurrence, while the second one (RF) gives a prediction in terms of % of area susceptible to slope failure. With this aim in mind, a landslide inventory map (related to the studied event) has been drawn up through analyses of high-resolution GeoEye satellite images, in a GIS environment. Data layers of 11 causative factors have been created and processed in order to be used as continuous numerical or discrete categorical variables in statistical analysis. In particular, the logistic regression method has frequent difficulties in managing numerical continuous and discrete categorical variables
An, Yongkai; Lu, Wenxi; Cheng, Weiguo
2015-08-01
This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately. PMID:26264008
NASA Astrophysics Data System (ADS)
Ochoa Gutierrez, L. H.; Vargas Jimenez, C. A.; Niño Vasquez, L. F.
2011-12-01
The "Sabana de Bogota" (Bogota Savannah) is the most important social and economical center of Colombia. Almost the third of population is concentrated in this region and generates about the 40% of Colombia's Internal Brute Product (IBP). According to this, the zone presents an elevated vulnerability in case that a high destructive seismic event occurs. Historical evidences show that high magnitude events took place in the past with a huge damage caused to the city and indicate that is probable that such events can occur in the next years. This is the reason why we are working in an early warning generation system, using the first few seconds of a seismic signal registered by three components and wide band seismometers. Such system can be implemented using Computational Intelligence tools, designed and calibrated to the particular Geological, Structural and environmental conditions present in the region. The methods developed are expected to work on real time, thus suitable software and electronic tools need to be developed. We used Support Vector Machines Regression (SVMR) methods trained and tested with historic seismic events registered by "EL ROSAL" Station, located near Bogotá, calculating descriptors or attributes as the input of the model, from the first 6 seconds of signal. With this algorithm, we obtained less than 10% of mean absolute error and correlation coefficients greater than 85% in hypocentral distance and Magnitude estimation. With this results we consider that we can improve the method trying to have better accuracy with less signal time and that this can be a very useful model to be implemented directly in the seismological stations to generate a fast characterization of the event, broadcasting not only raw signal but pre-processed information that can be very useful for accurate Early Warning Generation.
Penalized maximum-likelihood sinogram restoration for dual focal spot computed tomography.
Forthmann, P; Köhler, T; Begemann, P G C; Defrise, M
2007-08-01
Due to various system non-idealities, the raw data generated by a computed tomography (CT) machine are not readily usable for reconstruction. Although the deterministic nature of corruption effects such as crosstalk and afterglow permits correction by deconvolution, there is a drawback because deconvolution usually amplifies noise. Methods that perform raw data correction combined with noise suppression are commonly termed sinogram restoration methods. The need for sinogram restoration arises, for example, when photon counts are low and non-statistical reconstruction algorithms such as filtered backprojection are used. Many modern CT machines offer a dual focal spot (DFS) mode, which serves the goal of increased radial sampling by alternating the focal spot between two positions on the anode plate during the scan. Although the focal spot mode does not play a role with respect to how the data are affected by the above-mentioned corruption effects, it needs to be taken into account if regularized sinogram restoration is to be applied to the data. This work points out the subtle difference in processing that sinogram restoration for DFS requires, how it is correctly employed within the penalized maximum-likelihood sinogram restoration algorithm and what impact it has on image quality. PMID:17634647
NASA Astrophysics Data System (ADS)
He, Yaqian; Bo, Yanchen; Chai, Leilei; Liu, Xiaolong; Li, Aihua
2016-08-01
Leaf Area Index (LAI) is an important parameter of vegetation structure. A number of moderate resolution LAI products have been produced in urgent need of large scale vegetation monitoring. High resolution LAI reference maps are necessary to validate these LAI products. This study used a geostatistical regression (GR) method to estimate LAI reference maps by linking in situ LAI and Landsat TM/ETM+ and SPOT-HRV data over two cropland and two grassland sites. To explore the discrepancies of employing different vegetation indices (VIs) on estimating LAI reference maps, this study established the GR models for different VIs, including difference vegetation index (DVI), normalized difference vegetation index (NDVI), and ratio vegetation index (RVI). To further assess the performance of the GR model, the results from the GR and Reduced Major Axis (RMA) models were compared. The results show that the performance of the GR model varies between the cropland and grassland sites. At the cropland sites, the GR model based on DVI provides the best estimation, while at the grassland sites, the GR model based on DVI performs poorly. Compared to the RMA model, the GR model improves the accuracy of reference LAI maps in terms of root mean square errors (RMSE) and bias.
Gang, Grace J.; Stayman, J. Webster; Zbijewski, Wojciech; Siewerdsen, Jeffrey H.
2014-08-15
Purpose: Nonstationarity is an important aspect of imaging performance in CT and cone-beam CT (CBCT), especially for systems employing iterative reconstruction. This work presents a theoretical framework for both filtered-backprojection (FBP) and penalized-likelihood (PL) reconstruction that includes explicit descriptions of nonstationary noise, spatial resolution, and task-based detectability index. Potential utility of the model was demonstrated in the optimal selection of regularization parameters in PL reconstruction. Methods: Analytical models for local modulation transfer function (MTF) and noise-power spectrum (NPS) were investigated for both FBP and PL reconstruction, including explicit dependence on the object and spatial location. For FBP, a cascaded systems analysis framework was adapted to account for nonstationarity by separately calculating fluence and system gains for each ray passing through any given voxel. For PL, the point-spread function and covariance were derived using the implicit function theorem and first-order Taylor expansion according toFessler [“Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): Applications to tomography,” IEEE Trans. Image Process. 5(3), 493–506 (1996)]. Detectability index was calculated for a variety of simple tasks. The model for PL was used in selecting the regularization strength parameter to optimize task-based performance, with both a constant and a spatially varying regularization map. Results: Theoretical models of FBP and PL were validated in 2D simulated fan-beam data and found to yield accurate predictions of local MTF and NPS as a function of the object and the spatial location. The NPS for both FBP and PL exhibit similar anisotropic nature depending on the pathlength (and therefore, the object and spatial location within the object) traversed by each ray, with the PL NPS experiencing greater smoothing along directions with higher noise. The MTF of FBP
Penalized differential pathway analysis of integrative oncogenomics studies.
van Wieringen, Wessel N; van de Wiel, Mark A
2014-04-01
Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature. PMID:24552967
[Extramural research funds and penal law--status of legislation].
Ulsenheimer, Klaus
2005-04-01
After decades of smooth functioning, the cooperation of physicians and hospitals with the industry (much desired from the side of the government in the interest of clinical research) has fallen in legal discredit due to increasingly frequent criminal inquires and proceedings for unduly privileges, corruption, and embezzlement. The discredit is so severe that the industry funding for clinical research is diverted abroad to an increasing extent. The legal elements of embezzlement assume the intentional violation of the entrusted funds against the interest of the customer. Undue privileges occur when an official requests an advantage in exchange for a service (or is promised one or takes one) in his or somebody else's interest. The elements of corruption are then given when the receiver of the undue privilege provides an illegal service or takes a discretionary decision under the influence of the gratuity. The tension between the prohibition of undue privileges (as regulated by the penal law) and the granting of extramural funds (as regulated by the administrative law in academic institutions) can be reduced through a high degree of transparency and the start of control possibilities--public announcement and authorization by the officials--as well as through exact documentation and observance of the principles of separation of interests and moderation. With the anti-corruption law of 1997, it is possible to charge of corruption also physicians employed in private institutions. In contrast, physicians in private practice are not considered in the above criminal facts. They can only be charged of misdemeanor, or called to respond to the professional board, on the basis of the law that regulates advertising for medicinal products (Heilmittelwerbegesetz). PMID:15957662
Xie, Benhuai; Shen, Xiaotong
2009-01-01
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method. PMID:19920875
A scalable projective scaling algorithm for l(p) loss with convex penalizations.
Zhou, Hongbo; Cheng, Qiang
2015-02-01
This paper presents an accurate, efficient, and scalable algorithm for minimizing a special family of convex functions, which have a lp loss function as an additive component. For this problem, well-known learning algorithms often have well-established results on accuracy and efficiency, but there exists rarely any report on explicit linear scalability with respect to the problem size. The proposed approach starts with developing a second-order learning procedure with iterative descent for general convex penalization functions, and then builds efficient algorithms for a restricted family of functions, which satisfy the Karmarkar's projective scaling condition. Under this condition, a light weight, scalable message passing algorithm (MPA) is further developed by constructing a series of simpler equivalent problems. The proposed MPA is intrinsically scalable because it only involves matrix-vector multiplication and avoids matrix inversion operations. The MPA is proven to be globally convergent for convex formulations; for nonconvex situations, it converges to a stationary point. The accuracy, efficiency, scalability, and applicability of the proposed method are verified through extensive experiments on sparse signal recovery, face image classification, and over-complete dictionary learning problems. PMID:25608289
Ellimoottil, Chandy; Ryan, Andrew M; Hou, Hechuan; Dupree, James; Hallstrom, Brian; Miller, David C
2016-09-01
In an effort to reduce episode payment variation for joint replacement at US hospitals, the Centers for Medicare and Medicaid Services (CMS) recently implemented the Comprehensive Care for Joint Replacement bundled payment program. Some stakeholders are concerned that the program may unintentionally penalize hospitals because it lacks a mechanism (such as risk adjustment) to sufficiently account for patients' medical complexity. Using Medicare claims for patients in Michigan who underwent lower extremity joint replacement in the period 2011-13, we applied payment methods analogous to those CMS intends to use in determining annual bonuses or penalties (reconciliation payments) to hospitals. We calculated the net difference in reconciliation payments with and without risk adjustment. We found that reconciliation payments were reduced by $827 per episode for each standard-deviation increase in a hospital's patient complexity. Moreover, we found that risk adjustment could increase reconciliation payments to some hospitals by as much as $114,184 annually. Our findings suggest that CMS should include risk adjustment in the Comprehensive Care for Joint Replacement program and in future bundled payment programs. PMID:27605647
An example of neutronic penalizations in reactivity transient analysis using 3D coupled chain HEMERA
Dubois, F.; Normand, B.; Sargeni, A.
2012-07-01
HEMERA (Highly Evolutionary Methods for Extensive Reactor Analyses), is a fully coupled 3D computational chain developed jointly by IRSN and CEA. It is composed of CRONOS2 (core neutronics, cross sections library from APOLLO2), FLICA4 (core thermal-hydraulics) and the system code CATHARE. Multi-level and multi-dimensional models are developed to account for neutronics, core thermal-hydraulics, fuel thermal analysis and system thermal-hydraulics, dedicated to best-estimate, conservative simulations and sensitivity analysis. In IRSN, the HEMERA chain is widely used to study several types of reactivity accidents and for sensitivity studies. Just as an example of the HEMERA possibilities, we present here two types of neutronic penalizations and their impact on a power transient due to a REA (Rod Ejection Accident): in the first one, we studied a bum-up distribution modification and in the second one, a delayed-neutron fraction modification. Both modifications are applied to the whole core or localized in a few assemblies. Results show that it is possible to use global or local changes but 1) in case of bum-up modification, the total core power can increase when assembly peak power decrease so, care has to be taken if the goal is to maximize a local power peak and 2) for delayed-neutron fraction, a local modification can have the same effect as the one on the whole core, provided that it is large enough. (authors)
Prediction in Multiple Regression.
ERIC Educational Resources Information Center
Osborne, Jason W.
2000-01-01
Presents the concept of prediction via multiple regression (MR) and discusses the assumptions underlying multiple regression analyses. Also discusses shrinkage, cross-validation, and double cross-validation of prediction equations and describes how to calculate confidence intervals around individual predictions. (SLD)
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-01
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Regularization in finite mixture of regression models with diverging number of parameters.
Khalili, Abbas; Lin, Shili
2013-06-01
Feature (variable) selection has become a fundamentally important problem in recent statistical literature. Sometimes, in applications, many variables are introduced to reduce possible modeling biases, but the number of variables a model can accommodate is often limited by the amount of data available. In other words, the number of variables considered depends on the sample size, which reflects the estimability of the parametric model. In this article, we consider the problem of feature selection in finite mixture of regression models when the number of parameters in the model can increase with the sample size. We propose a penalized likelihood approach for feature selection in these models. Under certain regularity conditions, our approach leads to consistent variable selection. We carry out extensive simulation studies to evaluate the performance of the proposed approach under controlled settings. We also applied the proposed method to two real data. The first is on telemonitoring of Parkinson's disease (PD), where the problem concerns whether dysphonic features extracted from the patients' speech signals recorded at home can be used as surrogates to study PD severity and progression. The second is on breast cancer prognosis, in which one is interested in assessing whether cell nuclear features may offer prognostic values on long-term survival of breast cancer patients. Our analysis in each of the application revealed a mixture structure in the study population and uncovered a unique relationship between the features and the response variable in each of the mixture component. PMID:23556535
Demosaicing Based on Directional Difference Regression and Efficient Regression Priors.
Wu, Jiqing; Timofte, Radu; Van Gool, Luc
2016-08-01
Color demosaicing is a key image processing step aiming to reconstruct the missing pixels from a recorded raw image. On the one hand, numerous interpolation methods focusing on spatial-spectral correlations have been proved very efficient, whereas they yield a poor image quality and strong visible artifacts. On the other hand, optimization strategies, such as learned simultaneous sparse coding and sparsity and adaptive principal component analysis-based algorithms, were shown to greatly improve image quality compared with that delivered by interpolation methods, but unfortunately are computationally heavy. In this paper, we propose efficient regression priors as a novel, fast post-processing algorithm that learns the regression priors offline from training data. We also propose an independent efficient demosaicing algorithm based on directional difference regression, and introduce its enhanced version based on fused regression. We achieve an image quality comparable to that of the state-of-the-art methods for three benchmarks, while being order(s) of magnitude faster. PMID:27254866
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors
Woodard, Dawn B.; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials. PMID:24293988
Geodesic least squares regression on information manifolds
Verdoolaege, Geert
2014-12-05
We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply this to scaling laws in magnetic confinement fusion.
Bayesian Spatial Quantile Regression
Reich, Brian J.; Fuentes, Montserrat; Dunson, David B.
2013-01-01
Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997–2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794
Bayesian Spatial Quantile Regression.
Reich, Brian J; Fuentes, Montserrat; Dunson, David B
2011-03-01
Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997-2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794
Dealing with Outliers: Robust, Resistant Regression
ERIC Educational Resources Information Center
Glasser, Leslie
2007-01-01
Least-squares linear regression is the best of statistics and it is the worst of statistics. The reasons for this paradoxical claim, arising from possible inapplicability of the method and the excessive influence of "outliers", are discussed and substitute regression methods based on median selection, which is both robust and resistant, are…
George: Gaussian Process regression
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel
2015-11-01
George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.
Technology Transfer Automated Retrieval System (TEKTRAN)
In precision agriculture regression has been used widely to quality the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually makes the regression model suboptimal. In this study, a regression-kriging method was attemp...
C-arm perfusion imaging with a fast penalized maximum-likelihood approach
NASA Astrophysics Data System (ADS)
Frysch, Robert; Pfeiffer, Tim; Bannasch, Sebastian; Serowy, Steffen; Gugel, Sebastian; Skalej, Martin; Rose, Georg
2014-03-01
Perfusion imaging is an essential method for stroke diagnostics. One of the most important factors for a successful therapy is to get the diagnosis as fast as possible. Therefore our approach aims at perfusion imaging (PI) with a cone beam C-arm system providing perfusion information directly in the interventional suite. For PI the imaging system has to provide excellent soft tissue contrast resolution in order to allow the detection of small attenuation enhancement due to contrast agent in the capillary vessels. The limited dynamic range of flat panel detectors as well as the sparse sampling of the slow rotating C-arm in combination with standard reconstruction methods results in limited soft tissue contrast. We choose a penalized maximum-likelihood reconstruction method to get suitable results. To minimize the computational load, the 4D reconstruction task is reduced to several static 3D reconstructions. We also include an ordered subset technique with transitioning to a small number of subsets, which adds sharpness to the image with less iterations while also suppressing the noise. Instead of the standard multiplicative EM correction, we apply a Newton-based optimization to further accelerate the reconstruction algorithm. The latter optimization reduces the computation time by up to 70%. Further acceleration is provided by a multi-GPU implementation of the forward and backward projection, which fulfills the demands of cone beam geometry. In this preliminary study we evaluate this procedure on clinical data. Perfusion maps are computed and compared with reference images from magnetic resonance scans. We found a high correlation between both images.
Das, Mini; Gifford, Howard C; O'Connor, J Michael; Glick, Stephen J
2011-04-01
We examined the application of an iterative penalized maximum likelihood (PML) reconstruction method for improved detectability of microcalcifications (MCs) in digital breast tomosynthesis (DBT). Localized receiver operating characteristic (LROC) psychophysical studies with human observers and 2-D image slices were conducted to evaluate the performance of this reconstruction method and to compare its performance against the commonly used Feldkamp FBP algorithm. DBT projections were generated using rigorous computer simulations that included accurate modeling of the noise and detector blur. Acquisition dose levels of 0.7, 1.0, and 1.5 mGy in a 5-cm-thick compressed breast were tested. The defined task was to localize and detect MC clusters consisting of seven MCs. The individual MC diameter was 150 μm. Compressed-breast phantoms derived from CT images of actual mastectomy specimens provided realistic background structures for the detection task. Four observers each read 98 test images for each combination of reconstruction method and acquisition dose. All observers performed better with the PML images than with the FBP images. With the acquisition dose of 0.7 mGy, the average areas under the LROC curve (A(L)) for the PML and FBP algorithms were 0.69 and 0.43, respectively. For the 1.0-mGy dose, the values of A(L) were 0.93 (PML) and 0.7 (FBP), while the 1.5-mGy dose resulted in areas of 1.0 and 0.9, respectively, for the PML and FBP algorithms. A 2-D analysis of variance applied to the individual observer areas showed statistically significant differences (at a significance level of 0.05) between the reconstruction strategies at all three dose levels. There were no significant differences in observer performance for any of the dose levels. PMID:21041158
NASA Astrophysics Data System (ADS)
Amin, Mohd Zaki M.; Islam, Tanvir; Ishak, Asnor M.
2014-10-01
The authors have applied an automated regression-based statistical method, namely, the automated statistical downscaling (ASD) model, to downscale and project the precipitation climatology in an equatorial climate region (Peninsular Malaysia). Five precipitation indices are, principally, downscaled and projected: mean monthly values of precipitation (Mean), standard deviation (STD), 90th percentile of rain day amount, percentage of wet days (Wet-day), and maximum number of consecutive dry days (CDD). The predictors, National Centers for Environmental Prediction (NCEP) products, are taken from the daily series reanalysis data, while the global climate model (GCM) outputs are from the Hadley Centre Coupled Model, version 3 (HadCM3) in A2/B2 emission scenarios and Third-Generation Coupled Global Climate Model (CGCM3) in A2 emission scenario. Meanwhile, the predictand data are taken from the arithmetically averaged rain gauge information and used as a baseline data for the evaluation. The results reveal, from the calibration and validation periods spanning a period of 40 years (1961-2000), the ASD model is capable to downscale the precipitation with reasonable accuracy. Overall, during the validation period, the model simulations with the NCEP predictors produce mean monthly precipitation of 6.18-6.20 mm/day (root mean squared error 0.78 and 0.82 mm/day), interpolated, respectively, on HadCM3 and CGCM3 grids, in contrast to 6.00 mm/day as observation. Nevertheless, the model suffers to perform reasonably well at the time of extreme precipitation and summer time, more specifically to generate the CDD and STD indices. The future projections of precipitation (2011-2099) exhibit that there would be an increase in the precipitation amount and frequency in most of the months. Taking the 1961-2000 timeline as the base period, overall, the annual mean precipitation would indicate a surplus projection by nearly 14~18 % under both GCM output cases (HadCM3 A2/B2 scenarios and
Hybrid fuzzy regression with trapezoidal fuzzy data
NASA Astrophysics Data System (ADS)
Razzaghnia, T.; Danesh, S.; Maleki, A.
2011-12-01
In this regard, this research deals with a method for hybrid fuzzy least-squares regression. The extension of symmetric triangular fuzzy coefficients to asymmetric trapezoidal fuzzy coefficients is considered as an effective measure for removing unnecessary fuzziness of the linear fuzzy model. First, trapezoidal fuzzy variable is applied to derive a bivariate regression model. In the following, normal equations are formulated to solve the four parts of hybrid regression coefficients. Also the model is extended to multiple regression analysis. Eventually, method is compared with Y-H.O. chang's model.
Regression modelling of Dst index
NASA Astrophysics Data System (ADS)
Parnowski, Aleksei
We developed a new approach to the problem of real-time space weather indices forecasting using readily available data from ACE and a number of ground stations. It is based on the regression modelling method [1-3], which combines the benefits of empirical and statistical approaches. Mathematically it is based upon the partial regression analysis and Monte Carlo simulations to deduce the empirical relationships in the system. The typical elapsed time per forecast is a few seconds on an average PC. This technique can be easily extended to other indices like AE and Kp. The proposed system can also be useful for investigating physical phenomena related to interactions between the solar wind and the magnetosphere -it already helped uncovering two new geoeffective parameters. 1. Parnowski A.S. Regression modeling method of space weather prediction // Astrophysics Space Science. — 2009. — V. 323, 2. — P. 169-180. doi:10.1007/s10509-009-0060-4 [arXiv:0906.3271] 2. Parnovskiy A.S. Regression Modeling and its Application to the Problem of Prediction of Space Weather // Journal of Automation and Information Sciences. — 2009. — V. 41, 5. — P. 61-69. doi:10.1615/JAutomatInfScien.v41.i5.70 3. Parnowski A.S. Statistically predicting Dst without satellite data // Earth, Planets and Space. — 2009. — V. 61, 5. — P. 621-624.
The Role of the Environmental Health Specialist in the Penal and Correctional System
ERIC Educational Resources Information Center
Walker, Bailus, Jr.; Gordon, Theodore J.
1976-01-01
Implementing a health and hygiene program in penal systems necessitates coordinating the entire staff. Health specialists could participate in facility planning and management, policy formation, and evaluation of medical care, housekeeping, and food services. They could also serve as liaisons between correctional staff and governmental or…
49 CFR 26.47 - Can recipients be penalized for failing to meet overall goals?
Code of Federal Regulations, 2014 CFR
2014-10-01
... 49 Transportation 1 2014-10-01 2014-10-01 false Can recipients be penalized for failing to meet overall goals? 26.47 Section 26.47 Transportation Office of the Secretary of Transportation PARTICIPATION... Operational Evolution Partnership Plan airport or other airport designated by the FAA, you must submit,...
49 CFR 26.47 - Can recipients be penalized for failing to meet overall goals?
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 1 2013-10-01 2013-10-01 false Can recipients be penalized for failing to meet overall goals? 26.47 Section 26.47 Transportation Office of the Secretary of Transportation PARTICIPATION... Operational Evolution Partnership Plan airport or other airport designated by the FAA, you must submit,...
49 CFR 26.47 - Can recipients be penalized for failing to meet overall goals?
Code of Federal Regulations, 2012 CFR
2012-10-01
... 49 Transportation 1 2012-10-01 2012-10-01 false Can recipients be penalized for failing to meet overall goals? 26.47 Section 26.47 Transportation Office of the Secretary of Transportation PARTICIPATION... Operational Evolution Partnership Plan airport or other airport designated by the FAA, you must submit,...
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2012 CFR
2012-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION GENERAL RULES OFFICIAL SEALS Penalties for Misuse of NARA Seals and Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If...
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2010 CFR
2010-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION GENERAL RULES OFFICIAL SEALS Penalties for Misuse of NARA Seals and Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If...
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2011 CFR
2011-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION GENERAL RULES OFFICIAL SEALS Penalties for Misuse of NARA Seals and Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If...
7 CFR 1484.73 - Are Cooperators penalized for failing to make required contributions?
Code of Federal Regulations, 2014 CFR
2014-01-01
... 7 Agriculture 10 2014-01-01 2014-01-01 false Are Cooperators penalized for failing to make required contributions? 1484.73 Section 1484.73 Agriculture Regulations of the Department of Agriculture (Continued) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE EXPORT PROGRAMS PROGRAMS TO HELP...
7 CFR 1484.73 - Are Cooperators penalized for failing to make required contributions?
Code of Federal Regulations, 2013 CFR
2013-01-01
... 7 Agriculture 10 2013-01-01 2013-01-01 false Are Cooperators penalized for failing to make required contributions? 1484.73 Section 1484.73 Agriculture Regulations of the Department of Agriculture (Continued) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE EXPORT PROGRAMS PROGRAMS TO HELP...
41 CFR 302-2.14 - Will I be penalized for violation of my service agreement?
Code of Federal Regulations, 2011 CFR
2011-07-01
... violation of my service agreement? 302-2.14 Section 302-2.14 Public Contracts and Property Management Federal Travel Regulation System RELOCATION ALLOWANCES INTRODUCTION 2-EMPLOYEES ELIGIBILITY REQUIREMENTS General Rules Service Agreements and Disclosure Statement § 302-2.14 Will I be penalized for violation...
41 CFR 302-2.14 - Will I be penalized for violation of my service agreement?
Code of Federal Regulations, 2010 CFR
2010-07-01
... violation of my service agreement? 302-2.14 Section 302-2.14 Public Contracts and Property Management Federal Travel Regulation System RELOCATION ALLOWANCES INTRODUCTION 2-EMPLOYEES ELIGIBILITY REQUIREMENTS General Rules Service Agreements § 302-2.14 Will I be penalized for violation of my service agreement?...
Regression versus No Regression in the Autistic Disorder: Developmental Trajectories
ERIC Educational Resources Information Center
Bernabei, P.; Cerquiglini, A.; Cortesi, F.; D' Ardia, C.
2007-01-01
Developmental regression is a complex phenomenon which occurs in 20-49% of the autistic population. Aim of the study was to assess possible differences in the development of regressed and non-regressed autistic preschoolers. We longitudinally studied 40 autistic children (18 regressed, 22 non-regressed) aged 2-6 years. The following developmental…
Regression analysis of networked data
Zhou, Yan; Song, Peter X.-K.
2016-01-01
This paper concerns regression methodology for assessing relationships between multi-dimensional response variables and covariates that are correlated within a network. To address analytical challenges associated with the integration of network topology into the regression analysis, we propose a hybrid quadratic inference method that uses both prior and data-driven correlations among network nodes. A Godambe information-based tuning strategy is developed to allocate weights between the prior and data-driven network structures, so the estimator is efficient. The proposed method is conceptually simple and computationally fast, and has appealing large-sample properties. It is evaluated by simulation, and its application is illustrated using neuroimaging data from an association study of the effects of iron deficiency on auditory recognition memory in infants. PMID:27279658
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Dose reduction in digital breast tomosynthesis using a penalized maximum likelihood reconstruction
NASA Astrophysics Data System (ADS)
Das, Mini; Gifford, Howard; O'Connor, Michael; Glick, Stephen J.
2009-02-01
Digital breast tomosynthesis (DBT) is a 3D imaging modality with limited angle projection data. The ability of tomosynthesis systems to accurately detect smaller microcalcifications is debatable. This is because of the higher noise in the projection data (lower average dose per projection), which is then propagated through the reconstructed image . Reconstruction methods that minimize the propagation of quantum noise have potential to improve microcalcification detectability using DBT. In this paper we show that penalized maximum likelihood (PML) reconstruction in DBT yields images with an improved resolution/noise tradeoff as compared to conventional filtered backprojection (FBP). Signal to noise ratio (SNR) using PML was observed to be higher than that obtained using the standard FBP algorithm. Our results indicate that for microcalcifications, using the PML algorithm, reconstructions obtained with a mean glandular dose (MGD) of 1.5 mGy yielded better SNR than that those obtained with FBP using a 4mGy total dose. Thus perhaps total dose could be reduced to one-third or lower with same microcalcification detectability, if PML reconstruction is used instead of FBP. Visibility of low contrast masses with various contrast levels were studied using a contrast-detail phantom in a breast shape structure with an average breast density. Images generated using various dose levels indicate that visibility of low contrast masses generated using PML reconstructions are significantly better than those generated using FBP. SNR measurements in the low-contrast study did not appear to correlate with the visual subjective analysis of the reconstruction indicating that SNR is not a good figure of merit to be used.
Chen, Y
2015-06-15
Purpose: To improve the quality of kV X-ray cone beam CT (CBCT) for use in radiotherapy delivery assessment and re-planning by using penalized likelihood (PL) iterative reconstruction and auto-segmentation accuracy of the resulting CBCTs as an image quality metric. Methods: Present filtered backprojection (FBP) CBCT reconstructions can be improved upon by PL reconstruction with image formation models and appropriate regularization constraints. We use two constraints: 1) image smoothing via an edge preserving filter, and 2) a constraint minimizing the differences between the reconstruction and a registered prior image. Reconstructions of prostate therapy CBCTs were computed with constraint 1 alone and with both constraints. The prior images were planning CTs(pCT) deformable-registered to the FBP reconstructions. Anatomy segmentations were done using atlas-based auto-segmentation (Elekta ADMIRE). Results: We observed small but consistent improvements in the Dice similarity coefficients of PL reconstructions over the FBP results, and additional small improvements with the added prior image constraint. For a CBCT with anatomy very similar in appearance to the pCT, we observed these changes in the Dice metric: +2.9% (prostate), +8.6% (rectum), −1.9% (bladder). For a second CBCT with a very different rectum configuration, we observed +0.8% (prostate), +8.9% (rectum), −1.2% (bladder). For a third case with significant lateral truncation of the field of view, we observed: +0.8% (prostate), +8.9% (rectum), −1.2% (bladder). Adding the prior image constraint raised Dice measures by about 1%. Conclusion: Efficient and practical adaptive radiotherapy requires accurate deformable registration and accurate anatomy delineation. We show here small and consistent patterns of improved contour accuracy using PL iterative reconstruction compared with FBP reconstruction. However, the modest extent of these results and the pattern of differences across CBCT cases suggest that
Image segmentation via piecewise constant regression
NASA Astrophysics Data System (ADS)
Acton, Scott T.; Bovik, Alan C.
1994-09-01
We introduce a novel unsupervised image segmentation technique that is based on piecewise constant (PICO) regression. Given an input image, a PICO output image for a specified feature size (scale) is computed via nonlinear regression. The regression effectively provides the constant region segmentation of the input image that has a minimum deviation from the input image. PICO regression-based segmentation avoids the problems of region merging, poor localization, region boundary ambiguity, and region fragmentation. Additionally, our segmentation method is particularly well-suited for corrupted (noisy) input data. An application to segmentation and classification of remotely sensed imagery is provided.
Regression Analysis: Legal Applications in Institutional Research
ERIC Educational Resources Information Center
Frizell, Julie A.; Shippen, Benjamin S., Jr.; Luna, Andrew L.
2008-01-01
This article reviews multiple regression analysis, describes how its results should be interpreted, and instructs institutional researchers on how to conduct such analyses using an example focused on faculty pay equity between men and women. The use of multiple regression analysis will be presented as a method with which to compare salaries of…
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive connection.…
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Webcast entitled Statistical Tools for Making Sense of Data, by the National Nutrient Criteria Support Center, N-STEPS (Nutrients-Scientific Technical Exchange Partnership. The section "Correlation and Regression" provides an overview of these two techniques in the context of nut...
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Bayesian ARTMAP for regression.
Sasu, L M; Andonie, R
2013-10-01
Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. PMID:23665468
The Regression Trunk Approach to Discover Treatment Covariate Interaction
ERIC Educational Resources Information Center
Dusseldorp, Elise; Meulman, Jacqueline J.
2004-01-01
The regression trunk approach (RTA) is an integration of regression trees and multiple linear regression analysis. In this paper RTA is used to discover treatment covariate interactions, in the regression of one continuous variable on a treatment variable with "multiple" covariates. The performance of RTA is compared to the classical method of…
Teipel, Stefan J.; Kurth, Jens; Krause, Bernd; Grothe, Michel J.
2015-01-01
Selecting a set of relevant markers to predict conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) has become a challenging task given the wealth of regional pathologic information that can be extracted from multimodal imaging data. Here, we used regularized regression approaches with an elastic net penalty for best subset selection of multiregional information from AV45-PET, FDG-PET and volumetric MRI data to predict conversion from MCI to AD. The study sample consisted of 127 MCI subjects from ADNI-2 who had a clinical follow-up between 6 and 31 months. Additional analyses assessed the effect of partial volume correction on predictive performance of AV45- and FDG-PET data. Predictor variables were highly collinear within and across imaging modalities. Penalized Cox regression yielded more parsimonious prediction models compared to unpenalized Cox regression. Within single modalities, time to conversion was best predicted by increased AV45-PET signal in posterior medial and lateral cortical regions, decreased FDG-PET signal in medial temporal and temporobasal regions, and reduced gray matter volume in medial, basal, and lateral temporal regions. Logistic regression models reached up to 72% cross-validated accuracy for prediction of conversion status, which was comparable to cross-validated accuracy of non-linear support vector machine classification. Regularized regression outperformed unpenalized stepwise regression when number of parameters approached or exceeded the number of training cases. Partial volume correction had a negative effect on the predictive performance of AV45-PET, but slightly improved the predictive value of FDG-PET data. Penalized regression yielded more parsimonious models than unpenalized stepwise regression for the integration of multiregional and multimodal imaging information. The advantage of penalized regression was particularly strong with a high number of collinear predictors. PMID:26199870
Ridge Regression: A Regression Procedure for Analyzing Correlated Independent Variables.
ERIC Educational Resources Information Center
Rakow, Ernest A.
Ridge regression is presented as an analytic technique to be used when predictor variables in a multiple linear regression situation are highly correlated, a situation which may result in unstable regression coefficients and difficulties in interpretation. Ridge regression avoids the problem of selection of variables that may occur in stepwise…
NASA Astrophysics Data System (ADS)
He, Anhua; Singh, Ramesh P.; Sun, Zhaohua; Ye, Qing; Zhao, Gang
2016-05-01
The earth tide, atmospheric pressure, precipitation and earthquake fluctuations, especially earthquake greatly impacts water well levels, thus anomalous co-seismic changes in ground water levels have been observed. In this paper, we have used four different models, simple linear regression (SLR), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) to compute the atmospheric pressure and earth tidal effects on water level. Furthermore, we have used the Akaike information criterion (AIC) to study the performance of various models. Based on the lowest AIC and sum of squares for error values, the best estimate of the effects of atmospheric pressure and earth tide on water level is found using the MLR model. However, MLR model does not provide multicollinearity between inputs, as a result the atmospheric pressure and earth tidal response coefficients fail to reflect the mechanisms associated with the groundwater level fluctuations. On the premise of solving serious multicollinearity of inputs, PLS model shows the minimum AIC value. The atmospheric pressure and earth tidal response coefficients show close response with the observation using PLS model. The atmospheric pressure and the earth tidal response coefficients are found to be sensitive to the stress-strain state using the observed data for the period 1 April-8 June 2008 of Chuan 03# well. The transient enhancement of porosity of rock mass around Chuan 03# well associated with the Wenchuan earthquake (Mw = 7.9 of 12 May 2008) that has taken its original pre-seismic level after 13 days indicates that the co-seismic sharp rise of water well could be induced by static stress change, rather than development of new fractures.
NASA Astrophysics Data System (ADS)
He, Anhua; Singh, Ramesh P.; Sun, Zhaohua; Ye, Qing; Zhao, Gang
2016-07-01
The earth tide, atmospheric pressure, precipitation and earthquake fluctuations, especially earthquake greatly impacts water well levels, thus anomalous co-seismic changes in ground water levels have been observed. In this paper, we have used four different models, simple linear regression (SLR), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) to compute the atmospheric pressure and earth tidal effects on water level. Furthermore, we have used the Akaike information criterion (AIC) to study the performance of various models. Based on the lowest AIC and sum of squares for error values, the best estimate of the effects of atmospheric pressure and earth tide on water level is found using the MLR model. However, MLR model does not provide multicollinearity between inputs, as a result the atmospheric pressure and earth tidal response coefficients fail to reflect the mechanisms associated with the groundwater level fluctuations. On the premise of solving serious multicollinearity of inputs, PLS model shows the minimum AIC value. The atmospheric pressure and earth tidal response coefficients show close response with the observation using PLS model. The atmospheric pressure and the earth tidal response coefficients are found to be sensitive to the stress-strain state using the observed data for the period 1 April-8 June 2008 of Chuan 03# well. The transient enhancement of porosity of rock mass around Chuan 03# well associated with the Wenchuan earthquake (Mw = 7.9 of 12 May 2008) that has taken its original pre-seismic level after 13 days indicates that the co-seismic sharp rise of water well could be induced by static stress change, rather than development of new fractures.
Ridge Regression Signal Processing
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Using Time-Series Regression to Predict Academic Library Circulations.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1984-01-01
Four methods were used to forecast monthly circulation totals in 15 midwestern academic libraries: dummy time-series regression, lagged time-series regression, simple average (straight-line forecasting), monthly average (naive forecasting). In tests of forecasting accuracy, dummy regression method and monthly mean method exhibited smallest average…
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
versions [19]. These techniques compare two programs with a large degree of syntactic similarity to prove that portions of one program version are equivalent to the other. Regression verification can be used for guaranteeing backward compatibility, and for showing behavioral equivalence in programs with syntactic differences, e.g., when a program is refactored to improve its performance, maintainability, or readability. Existing regression verification techniques leverage similarities between program versions by using abstraction and decomposition techniques to improve scalability of the analysis [10, 12, 19]. The abstractions and decomposition in the these techniques, e.g., summaries of unchanged code [12] or semantically equivalent methods [19], compute an over-approximation of the program behaviors. The equivalence checking results of these techniques are sound but not complete-they may characterize programs as not functionally equivalent when, in fact, they are equivalent. In this work we describe a novel approach that leverages the impact of the differences between two programs for scaling regression verification. We partition program behaviors of each version into (a) behaviors impacted by the changes and (b) behaviors not impacted (unimpacted) by the changes. Only the impacted program behaviors are used during equivalence checking. We then prove that checking equivalence of the impacted program behaviors is equivalent to checking equivalence of all program behaviors for a given depth bound. In this work we use symbolic execution to generate the program behaviors and leverage control- and data-dependence information to facilitate the partitioning of program behaviors. The impacted program behaviors are termed as impact summaries. The dependence analyses that facilitate the generation of the impact summaries, we believe, could be used in conjunction with other abstraction and decomposition based approaches, [10, 12], as a complementary reduction technique. An
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression. PMID:18450049
Incremental hierarchical discriminant regression.
Weng, Juyang; Hwang, Wey-Shiuan
2007-03-01
This paper presents incremental hierarchical discriminant regression (IHDR) which incrementally builds a decision tree or regression tree for very high-dimensional regression or decision spaces by an online, real-time learning system. Biologically motivated, it is an approximate computational model for automatic development of associative cortex, with both bottom-up sensory inputs and top-down motor projections. At each internal node of the IHDR tree, information in the output space is used to automatically derive the local subspace spanned by the most discriminating features. Embedded in the tree is a hierarchical probability distribution model used to prune very unlikely cases during the search. The number of parameters in the coarse-to-fine approximation is dynamic and data-driven, enabling the IHDR tree to automatically fit data with unknown distribution shapes (thus, it is difficult to select the number of parameters up front). The IHDR tree dynamically assigns long-term memory to avoid the loss-of-memory problem typical with a global-fitting learning algorithm for neural networks. A major challenge for an incrementally built tree is that the number of samples varies arbitrarily during the construction process. An incrementally updated probability model, called sample-size-dependent negative-log-likelihood (SDNLL) metric is used to deal with large sample-size cases, small sample-size cases, and unbalanced sample-size cases, measured among different internal nodes of the IHDR tree. We report experimental results for four types of data: synthetic data to visualize the behavior of the algorithms, large face image data, continuous video stream from robot navigation, and publicly available data sets that use human defined features. PMID:17385628
Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
Dahinden, Corinne; Parmigiani, Giovanni; Emerick, Mark C; Bühlmann, Peter
2007-01-01
Background The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. Conclusion We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables. PMID:18072965
Reconstruction of difference in sequential CT studies using penalized likelihood estimation.
Pourmorteza, A; Dang, H; Siewerdsen, J H; Stayman, J W
2016-03-01
Characterization of anatomical change and other differences is important in sequential computed tomography (CT) imaging, where a high-fidelity patient-specific prior image is typically present, but is not used, in the reconstruction of subsequent anatomical states. Here, we introduce a penalized likelihood (PL) method called reconstruction of difference (RoD) to directly reconstruct a difference image volume using both the current projection data and the (unregistered) prior image integrated into the forward model for the measurement data. The algorithm utilizes an alternating minimization to find both the registration and reconstruction estimates. This formulation allows direct control over the image properties of the difference image, permitting regularization strategies that inhibit noise and structural differences due to inconsistencies between the prior image and the current data. Additionally, if the change is known to be local, RoD allows local acquisition and reconstruction, as opposed to traditional model-based approaches that require a full support field of view (or other modifications). We compared the performance of RoD to a standard PL algorithm, in simulation studies and using test-bench cone-beam CT data. The performances of local and global RoD approaches were similar, with local RoD providing a significant computational speedup. In comparison across a range of data with differing fidelity, the local RoD approach consistently showed lower error (with respect to a truth image) than PL in both noisy data and sparsely sampled projection scenarios. In a study of the prior image registration performance of RoD, a clinically reasonable capture ranges were demonstrated. Lastly, the registration algorithm had a broad capture range and the error for reconstruction of CT data was 35% and 20% less than filtered back-projection for RoD and PL, respectively. The RoD has potential for delivering high-quality difference images in a range of sequential clinical
Reconstruction of difference in sequential CT studies using penalized likelihood estimation
NASA Astrophysics Data System (ADS)
Pourmorteza, A.; Dang, H.; Siewerdsen, J. H.; Stayman, J. W.
2016-03-01
Characterization of anatomical change and other differences is important in sequential computed tomography (CT) imaging, where a high-fidelity patient-specific prior image is typically present, but is not used, in the reconstruction of subsequent anatomical states. Here, we introduce a penalized likelihood (PL) method called reconstruction of difference (RoD) to directly reconstruct a difference image volume using both the current projection data and the (unregistered) prior image integrated into the forward model for the measurement data. The algorithm utilizes an alternating minimization to find both the registration and reconstruction estimates. This formulation allows direct control over the image properties of the difference image, permitting regularization strategies that inhibit noise and structural differences due to inconsistencies between the prior image and the current data. Additionally, if the change is known to be local, RoD allows local acquisition and reconstruction, as opposed to traditional model-based approaches that require a full support field of view (or other modifications). We compared the performance of RoD to a standard PL algorithm, in simulation studies and using test-bench cone-beam CT data. The performances of local and global RoD approaches were similar, with local RoD providing a significant computational speedup. In comparison across a range of data with differing fidelity, the local RoD approach consistently showed lower error (with respect to a truth image) than PL in both noisy data and sparsely sampled projection scenarios. In a study of the prior image registration performance of RoD, a clinically reasonable capture ranges were demonstrated. Lastly, the registration algorithm had a broad capture range and the error for reconstruction of CT data was 35% and 20% less than filtered back-projection for RoD and PL, respectively. The RoD has potential for delivering high-quality difference images in a range of sequential clinical
Reconstruction of difference in sequential CT studies using penalized likelihood estimation
Pourmorteza, A; Dang, H; Siewerdsen, J H; Stayman, J W
2016-01-01
Characterization of anatomical change and other differences is important in sequential computed tomography (CT) imaging, where a high-fidelity patient-specific prior image is typically present, but is not used, in the reconstruction of subsequent anatomical states. Here, we introduce a penalized likelihood (PL) method called reconstruction of difference (RoD) to directly reconstruct a difference image volume using both the current projection data and the (unregistered) prior image integrated into the forward model for the measurement data. The algorithm utilizes an alternating minimization to find both the registration and reconstruction estimates. This formulation allows direct control over the image properties of the difference image, permitting regularization strategies that inhibit noise and structural differences due to inconsistencies between the prior image and the current data.Additionally, if the change is known to be local, RoD allows local acquisition and reconstruction, as opposed to traditional model-based approaches that require a full support field of view (or other modifications). We compared the performance of RoD to a standard PL algorithm, in simulation studies and using test-bench cone-beam CT data. The performances of local and global RoD approaches were similar, with local RoD providing a significant computational speedup. In comparison across a range of data with differing fidelity, the local RoD approach consistently showed lower error (with respect to a truth image) than PL in both noisy data and sparsely sampled projection scenarios. In a study of the prior image registration performance of RoD, a clinically reasonable capture ranges were demonstrated. Lastly, the registration algorithm had a broad capture range and the error for reconstruction of CT data was 35% and 20% less than filtered back-projection for RoD and PL, respectively. The RoD has potential for delivering high-quality difference images in a range of sequential clinical
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems. PMID:24802528
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
NASA Astrophysics Data System (ADS)
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
A New Sample Size Formula for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical
Consumer Education. An Introductory Unit for Inmates in Penal Institutions.
ERIC Educational Resources Information Center
Schmoele, Henry H.; And Others
This introductory consumer education curriculum outline contains materials designed to help soon-to-be-released prisoners to develop an awareness of consumer concerns and to better manage their family lives. Each of the four units provided includes lesson objectives, suggested contents, suggested teaching methods, handouts, and tests. The unit on…
Recursive Algorithm For Linear Regression
NASA Technical Reports Server (NTRS)
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Heteroscedastic transformation cure regression models.
Chen, Chyong-Mei; Chen, Chen-Hsin
2016-06-30
Cure models have been applied to analyze clinical trials with cures and age-at-onset studies with nonsusceptibility. Lu and Ying (On semiparametric transformation cure model. Biometrika 2004; 91:331?-343. DOI: 10.1093/biomet/91.2.331) developed a general class of semiparametric transformation cure models, which assumes that the failure times of uncured subjects, after an unknown monotone transformation, follow a regression model with homoscedastic residuals. However, it cannot deal with frequently encountered heteroscedasticity, which may result from dispersed ranges of failure time span among uncured subjects' strata. To tackle the phenomenon, this article presents semiparametric heteroscedastic transformation cure models. The cure status and the failure time of an uncured subject are fitted by a logistic regression model and a heteroscedastic transformation model, respectively. Unlike the approach of Lu and Ying, we derive score equations from the full likelihood for estimating the regression parameters in the proposed model. The similar martingale difference function to their proposal is used to estimate the infinite-dimensional transformation function. Our proposed estimating approach is intuitively applicable and can be conveniently extended to other complicated models when the maximization of the likelihood may be too tedious to be implemented. We conduct simulation studies to validate large-sample properties of the proposed estimators and to compare with the approach of Lu and Ying via the relative efficiency. The estimating method and the two relevant goodness-of-fit graphical procedures are illustrated by using breast cancer data and melanoma data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26887342
Bailey-Wilson, Joan E.; Brennan, Jennifer S.; Bull, Shelley B; Culverhouse, Robert; Kim, Yoonhee; Jiang, Yuan; Jung, Jeesun; Li, Qing; Lamina, Claudia; Liu, Ying; Mägi, Reedik; Niu, Yue S.; Simpson, Claire L.; Wang, Libo; Yilmaz, Yildiz E.; Zhang, Heping; Zhang, Zhaogong
2012-01-01
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors. PMID:22128066
Gubernator, Jerzy; Lipka, Dominik; Korycińska, Mariola; Kempińska, Katarzyna; Milczarek, Magdalena; Wietrzyk, Joanna; Hrynyk, Rafał; Barnert, Sabine; Süss, Regine; Kozubek, Arkadiusz
2014-01-01
Liposomes act as efficient drug carriers. Recently, epirubicin (EPI) formulation was developed using a novel EDTA ion gradient method for drug encapsulation. This formulation displayed very good stability and drug retention in vitro in a two-year long-term stability experiment. The cryo-TEM images show drug precipitate structures different than ones formed with ammonium sulfate method, which is usually used to encapsulate anthracyclines. Its pharmacokinetic properties and its efficacy in the human breast MDA-MB-231 cancer xenograft model were also determined. The liposomal EPI formulation is eliminated slowly with an AUC of 7.6487, while the free drug has an AUC of only 0.0097. The formulation also had a much higher overall antitumor efficacy than the free drug. PMID:24621591
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
2016-07-01
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. PMID:26861909
Topics in route-regression analysis
Geissler, P.H.; Sauer, J.R.
1990-01-01
The route-regression method has been used in recent years to analyze data from roadside surveys. With this method, a population trend is estimated for each route in a region, then regional trends are estimated as a weighted mean of the individual route trends. This method can accurately incorporate data that is unbalanced by changes in years surveyed and observer differences. We suggest that route-regression methodology is most efficient in the estimation of long-term (>5 year) trends, and tends to provide conservative results for low-density species.
Nonquadratic penalization improves near-infrared diffuse optical tomography.
Jagannath, Ravi Prasad K; Yalavarthy, Phaneendra K
2013-08-01
A new approach that can easily incorporate any generic penalty function into the diffuse optical tomographic image reconstruction is introduced to show the utility of nonquadratic penalty functions. The penalty functions that were used include quadratic (ℓ2), absolute (ℓ1), Cauchy, and Geman-McClure. The regularization parameter in each of these cases was obtained automatically by using the generalized cross-validation method. The reconstruction results were systematically compared with each other via utilization of quantitative metrics, such as relative error and Pearson correlation. The reconstruction results indicate that, while the quadratic penalty may be able to provide better separation between two closely spaced targets, its contrast recovery capability is limited, and the sparseness promoting penalties, such as ℓ1, Cauchy, and Geman-McClure have better utility in reconstructing high-contrast and complex-shaped targets, with the Geman-McClure penalty being the most optimal one. PMID:24323209
A new bivariate negative binomial regression model
NASA Astrophysics Data System (ADS)
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
Afanador, N L; Tran, T N; Buydens, L M C
2013-03-20
Bio-pharmaceutical manufacturing is a multifaceted and complex process wherein the manufacture of a single batch hundreds of processing variables and raw materials are monitored. In these processes, identifying the candidate variables responsible for any changes in process performance can prove to be extremely challenging. Within this context, partial least squares (PLS) has proven to be an important tool in helping determine the root cause for changes in biological performance, such as cellular growth or viral propagation. In spite of the positive impact PLS has had in helping understand bio-pharmaceutical process data, the high variability in measured response (Y) and predictor variables (X), and weak relationship between X and Y, has at times made root cause determination for process changes difficult. Our goal is to demonstrate how the use of bootstrapping, in conjunction with permutation tests, can provide avenues for improving the selection of variables responsible for manufacturing process changes via the variable importance in the projection (PLS-VIP) statistic. Although applied uniquely to the PLS-VIP in this article, the generality of the aforementioned methods can be used to improve other variable selection methods, in addition to increasing confidence around other estimates obtained from a PLS model. PMID:23473249
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Su, Yongbo; She, Yue; Huang, Qiang; Shi, Chuanxin; Li, Zhongchao; Huang, Chengfei; Piao, Xiangshu; Li, Defa
2015-01-01
This experiment was conducted to determine the effects of inclusion level of soybean oil (SO) and palm oil (PO) on their digestible and metabolism energy (DE and ME) contents when fed to growing pigs by difference and regression method. Sixty-six crossbred growing barrows (Duroc×Landrace×Yorkshire and weighing 38.1±2.4 kg) were randomly allotted to a 2×5 factorial arrangement involving 2 lipid sources (SO and PO), and 5 levels of lipid (2%, 4%, 6%, 8%, and 10%) as well as a basal diet composed of corn and soybean meal. The barrows were housed in individual metabolism crates to facilitate separate collection of feces and urine, and were fed the assigned test diets at 4% of initial body weight per day. A 5-d total collection of feces and urine followed a 7-d diet adaptation period. The results showed that the DE and ME contents of SO and PO determined by the difference method were not affected by inclusion level. The DE and ME determined by the regression method for SO were greater compared with the corresponding respective values for PO (DE: 37.07, ME: 36.79 MJ/kg for SO; DE: 34.11, ME: 33.84 MJ/kg for PO, respectively). These values were close to the DE and ME values determined by the difference method at the 10% inclusion level (DE: 37.31, ME: 36.83 MJ/kg for SO; DE: 34.62, ME: 33.47 MJ/kg for PO, respectively). A similar response for the apparent total tract digestibility of acid-hydrolyzed ether extract (AEE) in lipids was observed. The true total tract digestibility of AEE in SO was significantly (p<0.05) greater than that for PO (97.5% and 91.1%, respectively). In conclusion, the DE and ME contents of lipid was not affected by its inclusion level. The difference method can substitute the regression method to determine the DE and ME contents in lipids when the inclusion level is 10%. PMID:26580443
Independent motion detection with a rival penalized adaptive particle filter
NASA Astrophysics Data System (ADS)
Becker, Stefan; Hübner, Wolfgang; Arens, Michael
2014-10-01
Aggregation of pixel based motion detection into regions of interest, which include views of single moving objects in a scene is an essential pre-processing step in many vision systems. Motion events of this type provide significant information about the object type or build the basis for action recognition. Further, motion is an essential saliency measure, which is able to effectively support high level image analysis. When applied to static cameras, background subtraction methods achieve good results. On the other hand, motion aggregation on freely moving cameras is still a widely unsolved problem. The image flow, measured on a freely moving camera is the result from two major motion types. First the ego-motion of the camera and second object motion, that is independent from the camera motion. When capturing a scene with a camera these two motion types are adverse blended together. In this paper, we propose an approach to detect multiple moving objects from a mobile monocular camera system in an outdoor environment. The overall processing pipeline consists of a fast ego-motion compensation algorithm in the preprocessing stage. Real-time performance is achieved by using a sparse optical flow algorithm as an initial processing stage and a densely applied probabilistic filter in the post-processing stage. Thereby, we follow the idea proposed by Jung and Sukhatme. Normalized intensity differences originating from a sequence of ego-motion compensated difference images represent the probability of moving objects. Noise and registration artefacts are filtered out, using a Bayesian formulation. The resulting a posteriori distribution is located on image regions, showing strong amplitudes in the difference image which are in accordance with the motion prediction. In order to effectively estimate the a posteriori distribution, a particle filter is used. In addition to the fast ego-motion compensation, the main contribution of this paper is the design of the probabilistic
RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION
Zhang, Zhongyang; Lange, Kenneth; Ophoff, Roel; Sabatti, Chiara
2011-01-01
Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18–29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way. We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost. PMID:21572975
Ternès, Nils; Rotolo, Federico; Michiels, Stefan
2016-07-10
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26970107
Efficient Regressions via Optimally Combining Quantile Information*
Zhao, Zhibiao; Xiao, Zhijie
2014-01-01
We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481
Allodji, Rodrigue S; Thiébaut, Anne C M; Leuraud, Klervi; Rage, Estelle; Henry, Stéphane; Laurier, Dominique; Bénichou, Jacques
2012-12-30
A broad variety of methods for measurement error (ME) correction have been developed, but these methods have rarely been applied possibly because their ability to correct ME is poorly understood. We carried out a simulation study to assess the performance of three error-correction methods: two variants of regression calibration (the substitution method and the estimation calibration method) and the simulation extrapolation (SIMEX) method. Features of the simulated cohorts were borrowed from the French Uranium Miners' Cohort in which exposure to radon had been documented from 1946 to 1999. In the absence of ME correction, we observed a severe attenuation of the true effect of radon exposure, with a negative relative bias of the order of 60% on the excess relative risk of lung cancer death. In the main scenario considered, that is, when ME characteristics previously determined as most plausible from the French Uranium Miners' Cohort were used both to generate exposure data and to correct for ME at the analysis stage, all three error-correction methods showed a noticeable but partial reduction of the attenuation bias, with a slight advantage for the SIMEX method. However, the performance of the three correction methods highly depended on the accurate determination of the characteristics of ME. In particular, we encountered severe overestimation in some scenarios with the SIMEX method, and we observed lack of correction with the three methods in some other scenarios. For illustration, we also applied and compared the proposed methods on the real data set from the French Uranium Miners' Cohort study. PMID:22996087
ERIC Educational Resources Information Center
Torrence, John Thomas
Excluding military installations, training programs in state and federal penal institutions were surveyed, through a mailed checklist, to test the hypotheses that (1) training programs in penal institutions were not related to the unfilled job openings by major occupations in the United States, and (2) that training programs reported would have a…
Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le
2015-01-01
Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589
Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le
2015-01-01
Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA. PMID:11410035
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry. PMID:9347535
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Ecological Regression and Voting Rights.
ERIC Educational Resources Information Center
Freedman, David A.; And Others
1991-01-01
The use of ecological regression in voting rights cases is discussed in the context of a lawsuit against Los Angeles County (California) in 1990. Ecological regression assumes that systematic voting differences between precincts are explained by ethnic differences. An alternative neighborhood model is shown to lead to different conclusions. (SLD)
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
[Regression grading in gastrointestinal tumors].
Tischoff, I; Tannapfel, A
2012-02-01
Preoperative neoadjuvant chemoradiation therapy is a well-established and essential part of the interdisciplinary treatment of gastrointestinal tumors. Neoadjuvant treatment leads to regressive changes in tumors. To evaluate the histological tumor response different scoring systems describing regressive changes are used and known as tumor regression grading. Tumor regression grading is usually based on the presence of residual vital tumor cells in proportion to the total tumor size. Currently, no nationally or internationally accepted grading systems exist. In general, common guidelines should be used in the pathohistological diagnostics of tumors after neoadjuvant therapy. In particularly, the standard tumor grading will be replaced by tumor regression grading. Furthermore, tumors after neoadjuvant treatment are marked with the prefix "y" in the TNM classification. PMID:22293790
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Prediction of dynamical systems by symbolic regression.
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast. PMID:27575130
Moving the Bar: Transformations in Linear Regression.
ERIC Educational Resources Information Center
Miranda, Janet
The assumption that is most important to the hypothesis testing procedure of multiple linear regression is the assumption that the residuals are normally distributed, but this assumption is not always tenable given the realities of some data sets. When normal distribution of the residuals is not met, an alternative method can be initiated. As an…
ERIC Educational Resources Information Center
Semmens, Bob, Ed.; Cook, Sandy, Ed.
This document contains 19 papers presented at an international forum on education in penal systems. The following papers are included: "Burning" (Craig W.J. Minogue); "The Acquisition of Cognitive Skills as a Means of Recidivism Reduction: A Former Prisoner's Perspective" (Trevor Darryl Doherty); "CEA (Correctional Education Association)…
ERIC Educational Resources Information Center
Kilty, Ted K.
This study investigated the characteristics of reading programs offered to inmates of federal, state, and city/county penal institutions. The total number of institutions that responded to a questionnaire sent by the investigator was: federal, 27 (100% response); state, 426 (68% response); and city/county, 675 (16% response). Findings are reported…
Heritability Estimation using Regression Models for Correlation
Lee, Hye-Seung; Paik, Myunghee Cho; Rundek, Tatjana; Sacco, Ralph L; Dong, Chuanhui; Krischer, Jeffrey P
2012-01-01
Heritability estimates a polygenic effect on a trait for a population. Reliable interpretation of heritability is critical in planning further genetic studies to locate a gene responsible for the trait. This study accommodates both single and multiple trait cases by employing regression models for correlation parameter to infer the heritability. Sharing the properties of regression approach, the proposed methods are exible to incorporate non-genetic and/or non-additive genetic information in the analysis. The performances of the proposed model are compared with those using the likelihood approach through simulations and carotid Intima Media Thickness analysis from Northern Manhattan family Study. PMID:22457844
Uncertainty quantification in DIC with Kriging regression
NASA Astrophysics Data System (ADS)
Wang, Dezhi; DiazDelaO, F. A.; Wang, Weizhuo; Lin, Xiaoshan; Patterson, Eann A.; Mottershead, John E.
2016-03-01
A Kriging regression model is developed as a post-processing technique for the treatment of measurement uncertainty in classical subset-based Digital Image Correlation (DIC). Regression is achieved by regularising the sample-point correlation matrix using a local, subset-based, assessment of the measurement error with assumed statistical normality and based on the Sum of Squared Differences (SSD) criterion. This leads to a Kriging-regression model in the form of a Gaussian process representing uncertainty on the Kriging estimate of the measured displacement field. The method is demonstrated using numerical and experimental examples. Kriging estimates of displacement fields are shown to be in excellent agreement with 'true' values for the numerical cases and in the experimental example uncertainty quantification is carried out using the Gaussian random process that forms part of the Kriging model. The root mean square error (RMSE) on the estimated displacements is produced and standard deviations on local strain estimates are determined.
Multiple weight stepwise regression
Atkins, J. |; Campbell, J.
1993-10-01
In many science and engineering applications, there is an interest in predicting the outputs of a process for given levels of inputs. In order to develop a model, one could run the process (or a simulation of the process) at a number of points (a point would be one run at one set of possible input values) and observe the values of the outputs at those points. There observations can be used to predict the values of the outputs for other values of the inputs. Since the outputs are a function of the inputs, we can generate a surface in the space of possible inputs and outputs. This surface is called a response surface. In some cases, collecting data needed to generate a response surface can e very expensive. Thus, in these cases, there is a powerful incentive to minimize the sample size while building better response surfaces. One such case is the semiconductor equipment manufacturing industry. Semiconductor manufacturing equipment is complex and expensive. Depending upon the type of equipment, the number of control parameters may range from 10 to 30 with perhaps 5 to 10 being important. Since a single run can cost hundreds or thousands of dollars, it is very important to have efficient methods for building response surfaces. A current approach to this problem is to do the experiment in two stages. First, a traditional design (such as fractional factorial) is used to screen variables. After deciding which variables are significant, additional runs of the experiment are conducted. The original runs and the new runs are used to build a model with the significant variables. However, the original (screening) runs are not as helpful for building the model as some other points might have been. This paper presents a point selection scheme that is more efficient than traditional designs.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Penalized splines for smooth representation of high-dimensional Monte Carlo datasets
NASA Astrophysics Data System (ADS)
Whitehorn, Nathan; van Santen, Jakob; Lafebre, Sven
2013-09-01
Detector response to a high-energy physics process is often estimated by Monte Carlo simulation. For purposes of data analysis, the results of this simulation are typically stored in large multi-dimensional histograms, which can quickly become both too large to easily store and manipulate and numerically problematic due to unfilled bins or interpolation artifacts. We describe here an application of the penalized spline technique (Marx and Eilers, 1996) [1] to efficiently compute B-spline representations of such tables and discuss aspects of the resulting B-spline fits that simplify many common tasks in handling tabulated Monte Carlo data in high-energy physics analysis, in particular their use in maximum-likelihood fitting.
Harmonic regression and scale stability.
Lee, Yi-Hsuan; Haberman, Shelby J
2013-10-01
Monitoring a very frequently administered educational test with a relatively short history of stable operation imposes a number of challenges. Test scores usually vary by season, and the frequency of administration of such educational tests is also seasonal. Although it is important to react to unreasonable changes in the distributions of test scores in a timely fashion, it is not a simple matter to ascertain what sort of distribution is really unusual. Many commonly used approaches for seasonal adjustment are designed for time series with evenly spaced observations that span many years and, therefore, are inappropriate for data from such educational tests. Harmonic regression, a seasonal-adjustment method, can be useful in monitoring scale stability when the number of years available is limited and when the observations are unevenly spaced. Additional forms of adjustments can be included to account for variability in test scores due to different sources of population variations. To illustrate, real data are considered from an international language assessment. PMID:24092490
NASA Astrophysics Data System (ADS)
Gang, G. J.; Siewerdsen, J. H.; Stayman, J. W.
2016-03-01
Purpose: This work applies task-driven optimization to design CT tube current modulation and directional regularization in penalized-likelihood (PL) reconstruction. The relative performance of modulation schemes commonly adopted for filtered-backprojection (FBP) reconstruction were also evaluated for PL in comparison. Methods: We adopt a task-driven imaging framework that utilizes a patient-specific anatomical model and information of the imaging task to optimize imaging performance in terms of detectability index (d'). This framework leverages a theoretical model based on implicit function theorem and Fourier approximations to predict local spatial resolution and noise characteristics of PL reconstruction as a function of the imaging parameters to be optimized. Tube current modulation was parameterized as a linear combination of Gaussian basis functions, and regularization was based on the design of (directional) pairwise penalty weights for the 8 in-plane neighboring voxels. Detectability was optimized using a covariance matrix adaptation evolutionary strategy algorithm. Task-driven designs were compared to conventional tube current modulation strategies for a Gaussian detection task in an abdomen phantom. Results: The task-driven design yielded the best performance, improving d' by ~20% over an unmodulated acquisition. Contrary to FBP, PL reconstruction using automatic exposure control and modulation based on minimum variance (in FBP) performed worse than the unmodulated case, decreasing d' by 16% and 9%, respectively. Conclusions: This work shows that conventional tube current modulation schemes suitable for FBP can be suboptimal for PL reconstruction. Thus, the proposed task-driven optimization provides additional opportunities for improved imaging performance and dose reduction beyond that achievable with conventional acquisition and reconstruction.
Gang, G. J.; Siewerdsen, J. H.; Stayman, J. W.
2016-01-01
Purpose This work applies task-driven optimization to design CT tube current modulation and directional regularization in penalized-likelihood (PL) reconstruction. The relative performance of modulation schemes commonly adopted for filtered-backprojection (FBP) reconstruction were also evaluated for PL in comparison. Methods We adopt a task-driven imaging framework that utilizes a patient-specific anatomical model and information of the imaging task to optimize imaging performance in terms of detectability index (d’). This framework leverages a theoretical model based on implicit function theorem and Fourier approximations to predict local spatial resolution and noise characteristics of PL reconstruction as a function of the imaging parameters to be optimized. Tube current modulation was parameterized as a linear combination of Gaussian basis functions, and regularization was based on the design of (directional) pairwise penalty weights for the 8 in-plane neighboring voxels. Detectability was optimized using a covariance matrix adaptation evolutionary strategy algorithm. Task-driven designs were compared to conventional tube current modulation strategies for a Gaussian detection task in an abdomen phantom. Results The task-driven design yielded the best performance, improving d’ by ~20% over an unmodulated acquisition. Contrary to FBP, PL reconstruction using automatic exposure control and modulation based on minimum variance (in FBP) performed worse than the unmodulated case, decreasing d’ by 16% and 9%, respectively. Conclusions This work shows that conventional tube current modulation schemes suitable for FBP can be suboptimal for PL reconstruction. Thus, the proposed task-driven optimization provides additional opportunities for improved imaging performance and dose reduction beyond that achievable with conventional acquisition and reconstruction. PMID:27110053
Abstract Expression Grammar Symbolic Regression
NASA Astrophysics Data System (ADS)
Korns, Michael F.
This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Climate Change Projections Using Regional Regression Models
NASA Astrophysics Data System (ADS)
Griffis, V. W.; Gyawali, R.; Watkins, D. W.
2012-12-01
A typical approach to project climate change impacts on water resources systems is to downscale general circulation model (GCM) or regional climate model (RCM) outputs as forcing data for a watershed model. With downscaled climate model outputs becoming readily available, multi-model ensemble approaches incorporating mutliple GCMs, multiple emissions scenarios and multiple initializations are increasingly being used. While these multi-model climate ensembles represent a range of plausible futures, different hydrologic models and methods may complicate impact assessment. In particular, associated loss, flow routing, snowmelt and evapotranspiration computation methods can markedly increase hydrological modeling uncertainty. Other challenges include properly calibrating and verifying the watershed model and maintaining a consistent energy budget between climate and hydrologic models. An alternative approach, particularly appealing for ungauged basins or locations where record lengths are short, is to directly predict selected streamflow quantiles from regional regression equations that include physical basin characteristics as well as meteorological variables output by climate models (Fennessey 2011). Two sets of regional regression models are developed for the Great Lakes states using ordinary least squares and weighted least squares regression. The regional regression modeling approach is compared with physically based hydrologic modeling approaches for selected Great Lakes watersheds using downscaled outputs from the Coupled Model Intercomparison Project (CMIP3) as inputs to the Large Basin Runoff Model (LBRM) and the U.S. Army Corps Hydrologic Modeling System (HEC-HMS).
A tutorial on Bayesian Normal linear regression
NASA Astrophysics Data System (ADS)
Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens
2015-12-01
Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.
Liu, Zhan-yu; Huang, Jing-feng; Shi, Jing-jing; Tao, Rong-xiang; Zhou, Wan; Zhang, Li-Li
2007-10-01
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2,500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respectively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demonstrates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level. PMID:17910117
Kneale, G W; Mancuso, T F; Stewart, A M
1981-01-01
This paper reports on results from the study initiated by Mancuso into the health risks from low-level radiation in workers engaged in plutonium manufacture at Hanford Works, Washington State, USA, and attempts to answer criticisms of previous reports by an in-depth study. Previous reports have aroused much controversy because the reported risk per unit radiation dose for cancers of radiosensitive tissues was much greater than the risk generally accepted on the basis of other studies and widely used in setting safety levels for exposure to low-level radiation. The method of regression models in life-tables isolates the effect of radiation after statistically controlling for a wide range of possible interfering factors. Like the risk of lung cancer for uranium miners the dose-response relation showed a significant downward curve at about 10 rem. There may, therefore, be better agreement with other studies, conduct at higher doses, than is widely assumed. The findings on cancer latency (of about 25 years) and the effect of exposure age (increasing age increases the risk) are in general agreement with other studies. An unexplained finding is a significantly higher dose for all workers who developed cancers in tissues that are supposed to have low sensitivity to cancer induction by radiation. PMID:7236541
Transfer Learning Based on Logistic Regression
NASA Astrophysics Data System (ADS)
Paul, A.; Rottensteiner, F.; Heipke, C.
2015-08-01
In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.
2014-01-01
Background In biomedical research, response variables are often encountered which have bounded support on the open unit interval - (0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. Methods In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. Results If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar
Bayesian nonparametric regression with varying residual density.
Pati, Debdeep; Dunson, David B
2014-02-01
We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053
Regression Segmentation for M³ Spinal Images.
Wang, Zhijie; Zhen, Xiantong; Tay, KengYeow; Osman, Said; Romano, Walter; Li, Shuo
2015-08-01
Clinical routine often requires to analyze spinal images of multiple anatomic structures in multiple anatomic planes from multiple imaging modalities (M(3)). Unfortunately, existing methods for segmenting spinal images are still limited to one specific structure, in one specific plane or from one specific modality (S(3)). In this paper, we propose a novel approach, Regression Segmentation, that is for the first time able to segment M(3) spinal images in one single unified framework. This approach formulates the segmentation task innovatively as a boundary regression problem: modeling a highly nonlinear mapping function from substantially diverse M(3) images directly to desired object boundaries. Leveraging the advancement of sparse kernel machines, regression segmentation is fulfilled by a multi-dimensional support vector regressor (MSVR) which operates in an implicit, high dimensional feature space where M(3) diversity and specificity can be systematically categorized, extracted, and handled. The proposed regression segmentation approach was thoroughly tested on images from 113 clinical subjects including both disc and vertebral structures, in both sagittal and axial planes, and from both MRI and CT modalities. The overall result reaches a high dice similarity index (DSI) 0.912 and a low boundary distance (BD) 0.928 mm. With our unified and expendable framework, an efficient clinical tool for M(3) spinal image segmentation can be easily achieved, and will substantially benefit the diagnosis and treatment of spinal diseases. PMID:25361503
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
Interpretation of Standardized Regression Coefficients in Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Demonstration of a Fiber Optic Regression Probe
NASA Technical Reports Server (NTRS)
Korman, Valentin; Polzin, Kurt A.
2010-01-01
The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Differential correction schemes in nonlinear regression
NASA Technical Reports Server (NTRS)
Decell, H. P., Jr.; Speed, F. M.
1972-01-01
Classical iterative methods in nonlinear regression are reviewed and improved upon. This is accomplished by discussion of the geometrical and theoretical motivation for introducing modifications using generalized matrix inversion. Examples having inherent pitfalls are presented and compared in terms of results obtained using classical and modified techniques. The modification is shown to be useful alone or in conjunction with other modifications appearing in the literature.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Learning regulatory programs by threshold SVD regression
Ma, Xin; Xiao, Luo; Wong, Wing Hung
2014-01-01
We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature. PMID:25331876
General Regression and Representation Model for Classification
Qian, Jianjun; Yang, Jian; Xu, Yong
2014-01-01
Recently, the regularized coding-based classification methods (e.g. SRC and CRC) show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR) for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients) and the specific information (weight matrix of image pixels) to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel) weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR) and robust general regression and representation classifier (R-GRR). The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms. PMID:25531882
Brandner, Tobias
2013-06-01
The paper describes how distrust shapes the network of relationships between the different agents in the penal context, among inmates, between inmates and their family, between inmates and staff, between counselors and staff, and between inmates and counselors, and discusses how counseling strategies need to be adjusted to counter the effects of the institutional and biographical context of distrust. The paper is based on many years of participation and observation in the context of Hong Kong. PMID:24040740
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
[Regression and revitalization in hypnosis. Doubts and certainties, therapeutic utility].
Granone, F
1981-05-12
The difference between age regression and revification is pointed out and the neurophysiological and psychological bases of hypermnesia of the past are discussed. Moreover, mental, neurological, somatic and visceral symptomatology of revification, the usual techniques to obtain it and its therapeutical usefulness are described. Possible artifacts of age regression and methods to avoid then are then presented. PMID:7231772
An Importance Sampling EM Algorithm for Latent Regression Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2007-01-01
Reporting methods used in large-scale assessments such as the National Assessment of Educational Progress (NAEP) rely on latent regression models. To fit the latent regression model using the maximum likelihood estimation technique, multivariate integrals must be evaluated. In the computer program MGROUP used by the Educational Testing Service for…
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
Cactus: An Introduction to Regression
ERIC Educational Resources Information Center
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Fungible Weights in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.
2008-01-01
Every set of alternate weights (i.e., nonleast squares weights) in a multiple regression analysis with three or more predictors is associated with an infinite class of weights. All members of a given class can be deemed "fungible" because they yield identical "SSE" (sum of squared errors) and R[superscript 2] values. Equations for generating…
Spontaneous regression of breast cancer.
Lewison, E F
1976-11-01
The dramatic but rare regression of a verified case of breast cancer in the absence of adequate, accepted, or conventional treatment has been observed and documented by clinicians over the course of many years. In my practice limited to diseases of the breast, over the past 25 years I have observed 12 patients with a unique and unusual clinical course valid enough to be regarded as spontaneous regression of breast cancer. These 12 patients, with clinically confirmed breast cancer, had temporary arrest or partial remission of their disease in the absence of complete or adequate treatment. In most of these cases, spontaneous regression could not be equated ultimately with permanent cure. Three of these case histories are summarized, and patient characteristics of pertinent clinical interest in the remaining case histories are presented and discussed. Despite widespread doubt and skepticism, there is ample clinical evidence to confirm the fact that spontaneous regression of breast cancer is a rare phenomenon but is real and does occur. PMID:799758
Regression Models of Atlas Appearance
Rohlfing, Torsten; Sullivan, Edith V.; Pfefferbaum, Adolf
2010-01-01
Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260
Correlation Weights in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.; Jones, Jeff A.
2010-01-01
A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…
Quantile Regression with Censored Data
ERIC Educational Resources Information Center
Lin, Guixian
2009-01-01
The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…
Regression models of atlas appearance.
Rohlfing, Torsten; Sullivan, Edith V; Pfefferbaum, Adolf
2009-01-01
Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
The importance of scale for spatial-confounding bias and precision of spatial regression estimators
Paciorek, Christopher J
2010-01-01
Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect. PMID:21528104
NASA Astrophysics Data System (ADS)
Wu, Chunhung
2016-04-01
Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
Improving lesion detectability in PET imaging with a penalized likelihood reconstruction algorithm
NASA Astrophysics Data System (ADS)
Wangerin, Kristen A.; Ahn, Sangtae; Ross, Steven G.; Kinahan, Paul E.; Manjeshwar, Ravindra M.
2015-03-01
Ordered Subset Expectation Maximization (OSEM) is currently the most widely used image reconstruction algorithm for clinical PET. However, OSEM does not necessarily provide optimal image quality, and a number of alternative algorithms have been explored. We have recently shown that a penalized likelihood image reconstruction algorithm using the relative difference penalty, block sequential regularized expectation maximization (BSREM), achieves more accurate lesion quantitation than OSEM, and importantly, maintains acceptable visual image quality in clinical wholebody PET. The goal of this work was to evaluate lesion detectability with BSREM versus OSEM. We performed a twoalternative forced choice study using 81 patient datasets with lesions of varying contrast inserted into the liver and lung. At matched imaging noise, BSREM and OSEM showed equivalent detectability in the lungs, and BSREM outperformed OSEM in the liver. These results suggest that BSREM provides not only improved quantitation and clinically acceptable visual image quality as previously shown but also improved lesion detectability compared to OSEM. We then modeled this detectability study, applying both nonprewhitening (NPW) and channelized Hotelling (CHO) model observers to the reconstructed images. The CHO model observer showed good agreement with the human observers, suggesting that we can apply this model to future studies with varying simulation and reconstruction parameters.
NASA Astrophysics Data System (ADS)
Ahn, Sangtae; Ross, Steven G.; Asma, Evren; Miao, Jun; Jin, Xiao; Cheng, Lishui; Wollenweber, Scott D.; Manjeshwar, Ravindra M.
2015-08-01
Ordered subset expectation maximization (OSEM) is the most widely used algorithm for clinical PET image reconstruction. OSEM is usually stopped early and post-filtered to control image noise and does not necessarily achieve optimal quantitation accuracy. As an alternative to OSEM, we have recently implemented a penalized likelihood (PL) image reconstruction algorithm for clinical PET using the relative difference penalty with the aim of improving quantitation accuracy without compromising visual image quality. Preliminary clinical studies have demonstrated visual image quality including lesion conspicuity in images reconstructed by the PL algorithm is better than or at least as good as that in OSEM images. In this paper we evaluate lesion quantitation accuracy of the PL algorithm with the relative difference penalty compared to OSEM by using various data sets including phantom data acquired with an anthropomorphic torso phantom, an extended oval phantom and the NEMA image quality phantom; clinical data; and hybrid clinical data generated by adding simulated lesion data to clinical data. We focus on mean standardized uptake values and compare them for PL and OSEM using both time-of-flight (TOF) and non-TOF data. The results demonstrate improvements of PL in lesion quantitation accuracy compared to OSEM with a particular improvement in cold background regions such as lungs.
Ahn, Sangtae; Ross, Steven G; Asma, Evren; Miao, Jun; Jin, Xiao; Cheng, Lishui; Wollenweber, Scott D; Manjeshwar, Ravindra M
2015-08-01
Ordered subset expectation maximization (OSEM) is the most widely used algorithm for clinical PET image reconstruction. OSEM is usually stopped early and post-filtered to control image noise and does not necessarily achieve optimal quantitation accuracy. As an alternative to OSEM, we have recently implemented a penalized likelihood (PL) image reconstruction algorithm for clinical PET using the relative difference penalty with the aim of improving quantitation accuracy without compromising visual image quality. Preliminary clinical studies have demonstrated visual image quality including lesion conspicuity in images reconstructed by the PL algorithm is better than or at least as good as that in OSEM images. In this paper we evaluate lesion quantitation accuracy of the PL algorithm with the relative difference penalty compared to OSEM by using various data sets including phantom data acquired with an anthropomorphic torso phantom, an extended oval phantom and the NEMA image quality phantom; clinical data; and hybrid clinical data generated by adding simulated lesion data to clinical data. We focus on mean standardized uptake values and compare them for PL and OSEM using both time-of-flight (TOF) and non-TOF data. The results demonstrate improvements of PL in lesion quantitation accuracy compared to OSEM with a particular improvement in cold background regions such as lungs. PMID:26158503
A dual formulation of a penalized maximum likelihood x-ray CT reconstruction problem
NASA Astrophysics Data System (ADS)
Xu, Jingyan; Taguchi, Katsuyuki; Gullberg, Grant T.; Tsui, Benjamin M. W.
2009-02-01
This work studies the dual formulation of a penalized maximum likelihood reconstruction problem in x-ray CT. The primal objective function is a Poisson log-likelihood combined with a weighted cross-entropy penalty term. The dual formulation of the primal optimization problem is then derived and the optimization procedure outlined. The dual formulation better exploits the structure of the problem, which translates to faster convergence of iterative reconstruction algorithms. A gradient descent algorithm is implemented for solving the dual problem and its performance is compared with the filtered back-projection algorithm, and with the primal formulation optimized by using surrogate functions. The 3D XCAT phantom and an analytical x-ray CT simulator are used to generate noise-free and noisy CT projection data set with monochromatic and polychromatic x-ray spectrums. The reconstructed images from the dual formulation delineate the internal structures at early iterations better than the primal formulation using surrogate functions. However the body contour is slower to converge in the dual than in the primal formulation. The dual formulation demonstrate better noise-resolution tradeoff near the internal organs than the primal formulation. Since the surrogate functions in general can provide a diagonal approximation of the Hessian matrix of the objective function, further convergence speed up may be achieved by deriving the surrogate function of the dual objective function.
A cost-function approach to rival penalized competitive learning (RPCL).
Ma, Jinwen; Wang, Taijun
2006-08-01
Rival penalized competitive learning (RPCL) has been shown to be a useful tool for clustering on a set of sample data in which the number of clusters is unknown. However, the RPCL algorithm was proposed heuristically and is still in lack of a mathematical theory to describe its convergence behavior. In order to solve the convergence problem, we investigate it via a cost-function approach. By theoretical analysis, we prove that a general form of RPCL, called distance-sensitive RPCL (DSRPCL), is associated with the minimization of a cost function on the weight vectors of a competitive learning network. As a DSRPCL process decreases the cost to a local minimum, a number of weight vectors eventually fall into a hypersphere surrounding the sample data, while the other weight vectors diverge to infinity. Moreover, it is shown by the theoretical analysis and simulation experiments that if the cost reduces into the global minimum, a correct number of weight vectors is automatically selected and located around the centers of the actual clusters, respectively. Finally, we apply the DSRPCL algorithms to unsupervised color image segmentation and classification of the wine data. PMID:16903360
Self-Adaptive Induction of Regression Trees.
Fidalgo-Merino, Raúl; Núñez, Marlon
2011-08-01
A new algorithm for incremental construction of binary regression trees is presented. This algorithm, called SAIRT, adapts the induced model when facing data streams involving unknown dynamics, like gradual and abrupt function drift, changes in certain regions of the function, noise, and virtual drift. It also handles both symbolic and numeric attributes. The proposed algorithm can automatically adapt its internal parameters and model structure to obtain new patterns, depending on the current dynamics of the data stream. SAIRT can monitor the usefulness of nodes and can forget examples from selected regions, storing the remaining ones in local windows associated to the leaves of the tree. On these conditions, current regression methods need a careful configuration depending on the dynamics of the problem. Experimentation suggests that the proposed algorithm obtains better results than current algorithms when dealing with data streams that involve changes with different speeds, noise levels, sampling distribution of examples, and partial or complete changes of the underlying function. PMID:21263164
Hierarchical regression for epidemiologic analyses of multiple exposures.
Greenland, S
1994-01-01
Many epidemiologic investigations are designed to study the effects of multiple exposures. Most of these studies are analyzed either by fitting a risk-regression model with all exposures forced in the model, or by using a preliminary-testing algorithm, such as stepwise regression, to produce a smaller model. Research indicates that hierarchical modeling methods can outperform these conventional approaches. These methods are reviewed and compared to two hierarchical methods, empirical-Bayes regression and a variant here called "semi-Bayes" regression, to full-model maximum likelihood and to model reduction by preliminary testing. The performance of the methods in a problem of predicting neonatal-mortality rates are compared. Based on the literature to date, it is suggested that hierarchical methods should become part of the standard approaches to multiple-exposure studies. PMID:7851328
Observational Studies: Matching or Regression?
Brazauskas, Ruta; Logan, Brent R
2016-03-01
In observational studies with an aim of assessing treatment effect or comparing groups of patients, several approaches could be used. Often, baseline characteristics of patients may be imbalanced between groups, and adjustments are needed to account for this. It can be accomplished either via appropriate regression modeling or, alternatively, by conducting a matched pairs study. The latter is often chosen because it makes groups appear to be comparable. In this article we considered these 2 options in terms of their ability to detect a treatment effect in time-to-event studies. Our investigation shows that a Cox regression model applied to the entire cohort is often a more powerful tool in detecting treatment effect as compared with a matched study. Real data from a hematopoietic cell transplantation study is used as an example. PMID:26712591
Nasrollahi, S M; Imani, M; Zebeli, Q
2015-12-01
A meta-analysis of the effect of forage particle size (FPS) on nutrient intake, digestibility, and milk production of dairy cattle was conducted using published data from the literature (1998-2014). Meta-regression was used to evaluate the effect of forage level, source, and preservation method on heterogeneity of the results for FPS. A total of 46 papers and 28 to 91 trials (each trial consisting of 2 treatment means) that reported changes in FPS in the diet of dairy cattle were identified. Estimated effect sizes of FPS were calculated on nutrient intake, nutrient digestibility, and milk production and composition. Intakes of dry matter and neutral detergent fiber increased with decreasing FPS (0.527 and 0.166kg/d, respectively) but neutral detergent fiber digestibility decreased (0.6%) with decreasing FPS. Heterogeneity (amount of variation among studies) was significant for all intake and digestibility parameters and the improvement in feed intake only occurred with decreasing FPS for diets containing a high level of forage (>50%). Also, the improvement in dry matter intake due to lowering FPS occurred for diets containing silage but not hay. Digestibility of dry matter increased with decreasing FPS when the forage source of the diet was not corn. Milk production consistently increased (0.541kg/d; heterogeneity=19%) and milk protein production increased (0.02kg/d) as FPS decreased, but FCM was not affected by FPS. Likewise, milk fat percentage decreased (0.058%) with decreasing FPS. The heterogeneity of milk parameters (including fat-corrected milk, milk fat, and milk protein), other than milk production, was also significant. Decreasing FPS in high-forage diets (>50%) increased milk protein production by 0.027%. Decreasing FPS increased milk protein content in corn forage-based diets and milk fat and protein percentage in hay-based diets. In conclusion, FPS has the potential to affect feed intake and milk production of dairy cows, but its effects depend upon
Vounou, Maria; Janousova, Eva; Wolz, Robin; Stein, Jason L.; Thompson, Paul M.; Rueckert, Daniel; Montana, Giovanni
2012-01-01
Scanning the entire genome in search of variants related to imaging phenotypes holds great promise in elucidating the genetic etiology of neurodegenerative disorders. Here we discuss the application of a penalized multivariate model, sparse reduced-rank regression (sRRR), for the genome-wide detection of markers associated with voxel-wise longitudinal changes in the brain caused by Alzheimer’s disease (AD). Using a sample from the Alzheimer’s Disease Neuroimaging Initiative database, we performed three separate studies that each compared two groups of individuals to identify genes associated with disease development and progression. For each comparison we took a two-step approach: initially, using penalized linear discriminant analysis, we identified voxels that provide an imaging signature of the disease with high classification accuracy; then we used this multivariate biomarker as a phenotype in a genome-wide association study, carried out using sRRR. The genetic markers were ranked in order of importance of association to the phenotypes using a data re-sampling approach. Our findings confirmed the key role of the APOE and TOMM40 genes but also highlighted some novel potential associations with AD. PMID:22209813
A VBA-based Simulation for Teaching Simple Linear Regression
ERIC Educational Resources Information Center
Jones, Gregory Todd; Hagtvedt, Reidar; Jones, Kari
2004-01-01
In spite of the name, simple linear regression presents a number of conceptual difficulties, particularly for introductory students. This article describes a simulation tool that provides a hands-on method for illuminating the relationship between parameters and sample statistics.
NASA Astrophysics Data System (ADS)
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Time series regression studies in environmental epidemiology
Bhaskaran, Krishnan; Gasparrini, Antonio; Hajat, Shakoor; Smeeth, Liam; Armstrong, Ben
2013-01-01
Time series regression studies have been widely used in environmental epidemiology, notably in investigating the short-term associations between exposures such as air pollution, weather variables or pollen, and health outcomes such as mortality, myocardial infarction or disease-specific hospital admissions. Typically, for both exposure and outcome, data are available at regular time intervals (e.g. daily pollution levels and daily mortality counts) and the aim is to explore short-term associations between them. In this article, we describe the general features of time series data, and we outline the analysis process, beginning with descriptive analysis, then focusing on issues in time series regression that differ from other regression methods: modelling short-term fluctuations in the presence of seasonal and long-term patterns, dealing with time varying confounding factors and modelling delayed (‘lagged’) associations between exposure and outcome. We finish with advice on model checking and sensitivity analysis, and some common extensions to the basic model. PMID:23760528
Time series regression studies in environmental epidemiology.
Bhaskaran, Krishnan; Gasparrini, Antonio; Hajat, Shakoor; Smeeth, Liam; Armstrong, Ben
2013-08-01
Time series regression studies have been widely used in environmental epidemiology, notably in investigating the short-term associations between exposures such as air pollution, weather variables or pollen, and health outcomes such as mortality, myocardial infarction or disease-specific hospital admissions. Typically, for both exposure and outcome, data are available at regular time intervals (e.g. daily pollution levels and daily mortality counts) and the aim is to explore short-term associations between them. In this article, we describe the general features of time series data, and we outline the analysis process, beginning with descriptive analysis, then focusing on issues in time series regression that differ from other regression methods: modelling short-term fluctuations in the presence of seasonal and long-term patterns, dealing with time varying confounding factors and modelling delayed ('lagged') associations between exposure and outcome. We finish with advice on model checking and sensitivity analysis, and some common extensions to the basic model. PMID:23760528
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722
Robust regression on noisy data for fusion scaling laws
NASA Astrophysics Data System (ADS)
Verdoolaege, Geert
2014-11-01
We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
Robust regression on noisy data for fusion scaling laws
Verdoolaege, Geert
2014-11-15
We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
NASA Astrophysics Data System (ADS)
Xu, Shiyu; Inscoe, Christy R.; Lu, Jianping; Zhou, Otto; Chen, Ying
2014-03-01
Stationary Digital Breast Tomosynthesis (sDBT) is a carbon nanotube based breast imaging device with fast data acquisition and decent projection resolution to provide three dimensional (3-D) volume information. To- mosynthesis 3-D image reconstruction is faced with the challenges of the cone beam geometry and the incomplete and nonsymmetric sampling due to the sparse views and limited view angle. Among all available reconstruction methods, statistical iterative method exhibits particular promising since it relies on an accurate physical and statistical model with prior knowledge. In this paper, we present the application of an edge-preserved regularizer to our previously proposed precomputed backprojection based penalized-likelihood (PPL) reconstruction. By using the edge-preserved regularizer, our experiments show that through tuning several parameters, resolution can be retained while noise is reduced significantly. Compared to other conventional noise reduction techniques in image reconstruction, less resolution is lost in order to gain certain noise reduction, which may benefit the research of low dose tomosynthesis.
Robust Mediation Analysis Based on Median Regression
Yuan, Ying; MacKinnon, David P.
2014-01-01
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
Regression of posterior uveal melanomas following cobalt-60 plaque radiotherapy
Cruess, A.F.; Augsburger, J.J.; Shields, J.A.; Brady, L.W.; Markoe, A.M.; Day, J.L.
1984-12-01
A method has been devised for evaluating the rate and extent of regression of the first 100 consecutive patients with a posterior uveal melanoma that had been managed by Cobalt-60 plaque radiotherapy at Wills Eye Hospital. It was found that the average posterior uveal melanoma in the series did not regress rapidly to a flat, depigmented scar but shrank slowly and persisted as a residual mass approximately 50% of the thickness of the original tumor at 54 months following Cobalt-60 plaque radiotherapy. The authors also found that the rate and extent of regression of the tumors in patients who subsequently developed metastatic melanoma were not appreciably different from the rate and extent of regression of the tumors in patients who remained well systemically. These observations indicate that the rate and extent of regression of posterior uveal melanomas following Cobalt-60 plaque radiotherapy are poor indicators of the prognosis of the affected patients for subsequent development of clinical metastatic disease.
Regression analysis of cytopathological data
Whittemore, A.S.; McLarty, J.W.; Fortson, N.; Anderson, K.
1982-12-01
Epithelial cells from the human body are frequently labelled according to one of several ordered levels of abnormality, ranging from normal to malignant. The label of the most abnormal cell in a specimen determines the score for the specimen. This paper presents a model for the regression of specimen scores against continuous and discrete variables, as in host exposure to carcinogens. Application to data and tests for adequacy of model fit are illustrated using sputum specimens obtained from a cohort of former asbestos workers.