Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Yi, Hui; Breheny, Patrick; Imam, Netsanet; Liu, Yongmei; Hoeschele, Ina
2015-01-01
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini–Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method. PMID:25354699
Yi, Hui; Breheny, Patrick; Imam, Netsanet; Liu, Yongmei; Hoeschele, Ina
2015-01-01
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.
Marginal longitudinal semiparametric regression via penalized splines
Kadiri, M. Al; Carroll, R.J.; Wand, M.P.
2010-01-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models. PMID:21037941
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Reduced rank regression via adaptive nuclear norm penalization
Chen, Kun; Dong, Hongbo; Chan, Kung-Sik
2014-01-01
Summary We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy. PMID:25045172
Penalized spline estimation for functional coefficient regression models.
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan
2010-04-01
The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Iterative Brinkman penalization for remeshed vortex methods
NASA Astrophysics Data System (ADS)
Hejlesen, Mads Mølholm; Koumoutsakos, Petros; Leonard, Anthony; Walther, Jens Honoré
2015-01-01
We introduce an iterative Brinkman penalization method for the enforcement of the no-slip boundary condition in remeshed vortex methods. In the proposed method, the Brinkman penalization is applied iteratively only in the neighborhood of the body. This allows for using significantly larger time steps, than what is customary in the Brinkman penalization, thus reducing its computational cost while maintaining the capability of the method to handle complex geometries. We demonstrate the accuracy of our method by considering challenging benchmark problems such as flow past an impulsively started cylinder and normal to an impulsively started and accelerated flat plate. We find that the present method enhances significantly the accuracy of the Brinkman penalization technique for the simulations of highly unsteady flows past complex geometries.
Greenland, Sander; Mansournia, Mohammad Ali
2015-10-15
Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t-distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log-F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log-F priors is trivial to implement and facilitates mean-squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies.
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS).
Lee, Wonyul; Liu, Yufeng
2012-10-01
Multivariate regression is a common statistical tool for practical problems. Many multivariate regression techniques are designed for univariate response cases. For problems with multiple response variables available, one common approach is to apply the univariate response regression technique separately on each response variable. Although it is simple and popular, the univariate response approach ignores the joint information among response variables. In this paper, we propose three new methods for utilizing joint information among response variables. All methods are in a penalized likelihood framework with weighted L(1) regularization. The proposed methods provide sparse estimators of conditional inverse co-variance matrix of response vector given explanatory variables as well as sparse estimators of regression parameters. Our first approach is to estimate the regression coefficients with plug-in estimated inverse covariance matrices, and our second approach is to estimate the inverse covariance matrix with plug-in estimated regression parameters. Our third approach is to estimate both simultaneously. Asymptotic properties of these methods are explored. Our numerical examples demonstrate that the proposed methods perform competitively in terms of prediction, variable selection, as well as inverse covariance matrix estimation.
An Iterative Brinkman penalization for particle vortex methods
NASA Astrophysics Data System (ADS)
Walther, J. H.; Hejlesen, M. M.; Leonard, A.; Koumoutsakos, P.
2013-11-01
We present an iterative Brinkman penalization method for the enforcement of the no-slip boundary condition in vortex particle methods. This is achieved by implementing a penalization of the velocity field using iteration of the penalized vorticity. We show that using the conventional Brinkman penalization method can result in an insufficient enforcement of solid boundaries. The specific problems of the conventional penalization method is discussed and three examples are presented by which the method in its current form has shown to be insufficient to consistently enforce the no-slip boundary condition. These are: the impulsively started flow past a cylinder, the impulsively started flow normal to a flat plate, and the uniformly accelerated flow normal to a flat plate. The iterative penalization algorithm is shown to give significantly improved results compared to the conventional penalization method for each of the presented flow cases.
Göbl, Christian S.; Bozkurt, Latife; Tura, Andrea; Pacini, Giovanni; Kautzky-Willer, Alexandra; Mittlböck, Martina
2015-01-01
This paper aims to introduce penalized estimation techniques in clinical investigations of diabetes, as well as to assess their possible advantages and limitations. Data from a previous study was used to carry out the simulations to assess: a) which procedure results in the lowest prediction error of the final model in the setting of a large number of predictor variables with high multicollinearity (of importance if insulin sensitivity should be predicted) and b) which procedure achieves the most accurate estimate of regression coefficients in the setting of fewer predictors with small unidirectional effects and moderate correlation between explanatory variables (of importance if the specific relation between an independent variable and insulin sensitivity should be examined). Moreover a special focus is on the correct direction of estimated parameter effects, a non-negligible source of error and misinterpretation of study results. The simulations were performed for varying sample size to evaluate the performance of LASSO, Ridge as well as different algorithms for Elastic Net. These methods were also compared with automatic variable selection procedures (i.e. optimizing AIC or BIC).We were not able to identify one method achieving superior performance in all situations. However, the improved accuracy of estimated effects underlines the importance of using penalized regression techniques in our example (e.g. if a researcher aims to compare relations of several correlated parameters with insulin sensitivity). However, the decision which procedure should be used depends on the specific context of a study (accuracy versus complexity) and moreover should involve clinical prior knowledge. PMID:26544569
Bergen, Silas; Sheppard, Lianne; Kaufman, Joel D; Szpiro, Adam A
2016-11-01
Air pollution epidemiology studies are trending towards a multi-pollutant approach. In these studies, exposures at subject locations are unobserved and must be predicted using observed exposures at misaligned monitoring locations. This induces measurement error, which can bias the estimated health effects and affect standard error estimates. We characterize this measurement error and develop an analytic bias correction when using penalized regression splines to predict exposure. Our simulations show bias from multi-pollutant measurement error can be severe, and in opposite directions or simultaneously positive or negative. Our analytic bias correction combined with a non-parametric bootstrap yields accurate coverage of 95% confidence intervals. We apply our methodology to analyze the association of systolic blood pressure with PM2.5 and NO2 in the NIEHS Sister Study. We find that NO2 confounds the association of systolic blood pressure with PM2.5 and vice versa. Elevated systolic blood pressure was significantly associated with increased PM2.5 and decreased NO2. Correcting for measurement error bias strengthened these associations and widened 95% confidence intervals.
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION
Wang, Lan; Kim, Yongdai; Li, Runze
2014-01-01
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis. PMID:24948843
Chaibub Neto, Elias; Bare, J Christopher; Margolin, Adam A
2014-01-01
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where "omics" features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2009-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape and topology optimization of fluid flows. Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid in complex geometries (where grid resolution varies both in space and time time), while Brinkman volume penalization allows easy variation of flow geometry without using body-fitted meshes by simply changing the shape of the penalization region. The use of Brinkman volume penalization approach allow seamless transition from shape to topology optimization by combining it with level set approach and increasing the size of the optimization space. The approach is demonstrated for shape optimization of a variety of fluid flows by optimizing single cost function (time averaged Drag coefficient) using covariance matrix adaptation (CMA) evolutionary algorithm.
Astronomical Methods for Nonparametric Regression
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Zhang, Zhenyue; Zha, Hongyuan; Simon, Horst
2006-07-31
In this paper, we developed numerical algorithms for computing sparse low-rank approximations of matrices, and we also provided a detailed error analysis of the proposed algorithms together with some numerical experiments. The low-rank approximations are constructed in a certain factored form with the degree of sparsity of the factors controlled by some user-specified parameters. In this paper, we cast the sparse low-rank approximation problem in the framework of penalized optimization problems. We discuss various approximation schemes for the penalized optimization problem which are more amenable to numerical computations. We also include some analysis to show the relations between the original optimization problem and the reduced one. We then develop a globally convergent discrete Newton-like iterative method for solving the approximate penalized optimization problems. We also compare the reconstruction errors of the sparse low-rank approximations computed by our new methods with those obtained using the methods in the earlier paper and several other existing methods for computing sparse low-rank approximations. Numerical examples show that the penalized methods are more robust and produce approximations with factors which have fewer columns and are sparser.
Synthesizing Regression Results: A Factored Likelihood Method
ERIC Educational Resources Information Center
Wu, Meng-Jia; Becker, Betsy Jane
2013-01-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…
2009-01-01
Background There is a growing awareness that interaction between multiple genes play an important role in the risk of common, complex multi-factorial diseases. Many common diseases are affected by certain genotype combinations (associated with some genes and their interactions). The identification and characterization of these susceptibility genes and gene-gene interaction have been limited by small sample size and large number of potential interactions between genes. Several methods have been proposed to detect gene-gene interaction in a case control study. The penalized logistic regression (PLR), a variant of logistic regression with L2 regularization, is a parametric approach to detect gene-gene interaction. On the other hand, the Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free approach to detect genotype combinations associated with disease risk. Methods We compared the power of MDR and PLR for detecting two-way and three-way interactions in a case-control study through extensive simulations. We generated several interaction models with different magnitudes of interaction effect. For each model, we simulated 100 datasets, each with 200 cases and 200 controls and 20 SNPs. We considered a wide variety of models such as models with just main effects, models with only interaction effects or models with both main and interaction effects. We also compared the performance of MDR and PLR to detect gene-gene interaction associated with acute rejection(AR) in kidney transplant patients. Results In this paper, we have studied the power of MDR and PLR for detecting gene-gene interaction in a case-control study through extensive simulation. We have compared their performances for different two-way and three-way interaction models. We have studied the effect of different allele frequencies on these methods. We have also implemented their performance on a real dataset. As expected, none of these methods were consistently better for all
PRAMS: a systematic method for evaluating penal institutions under litigation.
Wills, Cheryl D
2007-01-01
Forensic psychiatrists serve as expert witnesses in litigation involving the impact of conditions of confinement, including mental health care delivery, on the emotional well-being of institutionalized persons. Experts review volumes of data before formulating opinions and preparing reports. The author has developed PRAMS, a method for systematically reviewing and presenting data during mental health litigation involving detention and corrections facilities. The PRAMS method divides the examination process into five stages: paper review, real-world view, aggravating circumstances, mitigating circumstances, and supplemental information. PRAMS provides the scaffolding on which a compelling picture of an institution's system of care may be constructed and disseminated in reports and during courtroom testimony. Also, PRAMS enhances the organization, analysis, publication, and presentation of salient findings, thereby coordinating the forensic psychiatrist's efforts to provide expert opinions regarding complex systems of mental health care.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning
Shang, Shang; Bai, Jing; Song, Xiaolei; Wang, Hongkai; Lau, Jaclyn
2007-01-01
Conjugate gradient method is verified to be efficient for nonlinear optimization problems of large-dimension data. In this paper, a penalized linear and nonlinear combined conjugate gradient method for the reconstruction of fluorescence molecular tomography (FMT) is presented. The algorithm combines the linear conjugate gradient method and the nonlinear conjugate gradient method together based on a restart strategy, in order to take advantage of the two kinds of conjugate gradient methods and compensate for the disadvantages. A quadratic penalty method is adopted to gain a nonnegative constraint and reduce the illposedness of the problem. Simulation studies show that the presented algorithm is accurate, stable, and fast. It has a better performance than the conventional conjugate gradient-based reconstruction algorithms. It offers an effective approach to reconstruct fluorochrome information for FMT.
Differentiating among penal states.
Lacey, Nicola
2010-12-01
This review article assesses Loïc Wacquant's contribution to debates on penality, focusing on his most recent book, Punishing the Poor: The Neoliberal Government of Social Insecurity (Wacquant 2009), while setting its argument in the context of his earlier Prisons of Poverty (1999). In particular, it draws on both historical and comparative methods to question whether Wacquant's conception of 'the penal state' is adequately differentiated for the purposes of building the explanatory account he proposes; about whether 'neo-liberalism' has, materially, the global influence which he ascribes to it; and about whether, therefore, the process of penal Americanization which he asserts in his recent writings is credible.
NASA Astrophysics Data System (ADS)
Kasimov, Nurlybek; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-11-01
A novel volume penalization method to enforce immersed boundary conditions in Navier-Stokes and Euler equations is presented. Previously, Brinkman penalization has been used to introduce solid obstacles modeled as porous media, although it is limited to Dirichlet-type conditions on velocity and temperature. This method builds upon Brinkman penalization by allowing Neumann conditions to be applied in a general fashion. Correct boundary conditions are achieved through characteristic propagation into the thin layer inside of the obstacle. Inward pointing characteristics ensure nonphysical solution inside the obstacle does not propagate outside to the fluid. Dirichlet boundary conditions are enforced similarly to Brinkman method. Penalization parameters act on a much faster timescale than the characteristic timescale of the flow. Main advantage of the method is systematic means of the error control. This talk is focused on the progress that was made towards the extension of the method to the 3D flows around irregular shapes. This work was supported by ONR MURI on Soil Blast Modeling.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-06-01
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches.
Wu, Hulin; Xue, Hongqi; Kumar, Arun
2012-01-01
Summary Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this paper, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler’s method, trapezoidal rule and Runge-Kutta method. A higher order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators (DBE) are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods to an HIV study to further illustrate the usefulness of the proposed approaches. PMID:22376200
NASA Astrophysics Data System (ADS)
Tauriello, Gerardo; Koumoutsakos, Petros
2015-02-01
We present a comparative study of penalization and phase field methods for the solution of the diffusion equation in complex geometries embedded using simple Cartesian meshes. The two methods have been widely employed to solve partial differential equations in complex and moving geometries for applications ranging from solid and fluid mechanics to biology and geophysics. Their popularity is largely due to their discretization on Cartesian meshes thus avoiding the need to create body-fitted grids. At the same time, there are questions regarding their accuracy and it appears that the use of each one is confined by disciplinary boundaries. Here, we compare penalization and phase field methods to handle problems with Neumann and Robin boundary conditions. We discuss extensions for Dirichlet boundary conditions and in turn compare with methods that have been explicitly designed to handle Dirichlet boundary conditions. The accuracy of all methods is analyzed using one and two dimensional benchmark problems such as the flow induced by an oscillating wall and by a cylinder performing rotary oscillations. This comparative study provides information to decide which methods to consider for a given application and their incorporation in broader computational frameworks. We demonstrate that phase field methods are more accurate than penalization methods on problems with Neumann boundary conditions and we present an error analysis explaining this result.
NASA Astrophysics Data System (ADS)
Vasilyev, Oleg V.; Gazzola, Mattia; Koumoutsakos, Petros
2010-11-01
In this talk we discuss preliminary results for the use of hybrid wavelet collocation - Brinkman penalization approach for shape optimization for drag reduction in flows past linked bodies. This optimization relies on Adaptive Wavelet Collocation Method along with the Brinkman penalization technique and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Adaptive wavelet collocation method tackles the problem of efficiently resolving a fluid flow on a dynamically adaptive computational grid, while a level set approach is used to describe the body shape and the Brinkman volume penalization allows for an easy variation of flow geometry without requiring body-fitted meshes. We perform 2D simulations of linked bodies in order to investigate whether flat geometries are optimal for drag reduction. In order to accelerate the costly cost function evaluations we exploit the inherent parallelism of ES and we extend the CMA-ES implementation to a multi-host framework. This framework allows for an easy distribution of the cost function evaluations across several parallel architectures and it is not limited to only one computing facility. The resulting optimal shapes are geometrically consistent with the shapes that have been obtained in the pioneering wind tunnel experiments for drag reduction using Evolution Strategies by Ingo Rechenberg.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Relationship between Multiple Regression and Selected Multivariable Methods.
ERIC Educational Resources Information Center
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Morales, Jorge A.; Leroy, Matthieu; Bos, Wouter J.T.; Schneider, Kai
2014-10-01
A volume penalization approach to simulate magnetohydrodynamic (MHD) flows in confined domains is presented. Here the incompressible visco-resistive MHD equations are solved using parallel pseudo-spectral solvers in Cartesian geometries. The volume penalization technique is an immersed boundary method which is characterized by a high flexibility for the geometry of the considered flow. In the present case, it allows to use other than periodic boundary conditions in a Fourier pseudo-spectral approach. The numerical method is validated and its convergence is assessed for two- and three-dimensional hydrodynamic (HD) and MHD flows, by comparing the numerical results with results from literature and analytical solutions. The test cases considered are two-dimensional Taylor–Couette flow, the z-pinch configuration, three dimensional Orszag–Tang flow, Ohmic-decay in a periodic cylinder, three-dimensional Taylor–Couette flow with and without axial magnetic field and three-dimensional Hartmann-instabilities in a cylinder with an imposed helical magnetic field. Finally, we present a magnetohydrodynamic flow simulation in toroidal geometry with non-symmetric cross section and imposing a helical magnetic field to illustrate the potential of the method.
The Precision Efficacy Analysis for Regression Sample Size Method.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
Cathodic protection design using the regression and correlation method
Niembro, A.M.; Ortiz, E.L.G.
1997-09-01
A computerized statistical method which calculates the current demand requirement based on potential measurements for cathodic protection systems is introduced. The method uses the regression and correlation analysis of statistical measurements of current and potentials of the piping network. This approach involves four steps: field potential measurements, statistical determination of the current required to achieve full protection, installation of more cathodic protection capacity with distributed anodes around the plant and examination of the protection potentials. The procedure is described and recommendations for the improvement of the existing and new cathodic protection systems are given.
Regularized discriminative spectral regression method for heterogeneous face matching.
Huang, Xiangsheng; Lei, Zhen; Fan, Mingyu; Wang, Xiao; Li, Stan Z
2013-01-01
Face recognition is confronted with situations in which face images are captured in various modalities, such as the visual modality, the near infrared modality, and the sketch modality. This is known as heterogeneous face recognition. To solve this problem, we propose a new method called discriminative spectral regression (DSR). The DSR maps heterogeneous face images into a common discriminative subspace in which robust classification can be achieved. In the proposed method, the subspace learning problem is transformed into a least squares problem. Different mappings should map heterogeneous images from the same class close to each other, while images from different classes should be separated as far as possible. To realize this, we introduce two novel regularization terms, which reflect the category relationships among data, into the least squares approach. Experiments conducted on two heterogeneous face databases validate the superiority of the proposed method over the previous methods.
A locally adaptive kernel regression method for facies delineation
NASA Astrophysics Data System (ADS)
Fernàndez-Garcia, D.; Barahona-Palomo, M.; Henri, C. V.; Sanchez-Vila, X.
2015-12-01
Facies delineation is defined as the separation of geological units with distinct intrinsic characteristics (grain size, hydraulic conductivity, mineralogical composition). A major challenge in this area stems from the fact that only a few scattered pieces of hydrogeological information are available to delineate geological facies. Several methods to delineate facies are available in the literature, ranging from those based only on existing hard data, to those including secondary data or external knowledge about sedimentological patterns. This paper describes a methodology to use kernel regression methods as an effective tool for facies delineation. The method uses both the spatial and the actual sampled values to produce, for each individual hard data point, a locally adaptive steering kernel function, self-adjusting the principal directions of the local anisotropic kernels to the direction of highest local spatial correlation. The method is shown to outperform the nearest neighbor classification method in a number of synthetic aquifers whenever the available number of hard data is small and randomly distributed in space. In the case of exhaustive sampling, the steering kernel regression method converges to the true solution. Simulations ran in a suite of synthetic examples are used to explore the selection of kernel parameters in typical field settings. It is shown that, in practice, a rule of thumb can be used to obtain suboptimal results. The performance of the method is demonstrated to significantly improve when external information regarding facies proportions is incorporated. Remarkably, the method allows for a reasonable reconstruction of the facies connectivity patterns, shown in terms of breakthrough curves performance.
Mapping urban environmental noise: a land use regression method.
Xie, Dan; Liu, Yi; Chen, Jining
2011-09-01
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
NASA Astrophysics Data System (ADS)
Chatelin, Robin; Poncet, Philippe
2014-07-01
Particle methods are very convenient to compute transport equations in fluid mechanics as their computational cost is linear and they are not limited by convection stability conditions. To achieve large 3D computations the method must be coupled to efficient algorithms for velocity computations, including a good treatment of non-homogeneities and complex moving geometries. The Penalization method enables to consider moving bodies interaction by adding a term in the conservation of momentum equation. This work introduces a new computational algorithm to solve implicitly in the same step the Penalization term and the Laplace operators, since explicit computations are limited by stability issues, especially at low Reynolds number. This computational algorithm is based on the Sherman-Morrison-Woodbury formula coupled to a GMRES iterative method to reduce the computations to a sequence of Poisson problems: this allows to formulate a penalized Poisson equation as a large perturbation of a standard Poisson, by means of algebraic relations. A direct consequence is the possibility to use fast solvers based on Fast Fourier Transforms for this problem with good efficiency from both the computational and the memory consumption point of views, since these solvers are recursive and they do not perform any matrix assembling. The resulting fluid mechanics computations are very fast and they consume a small amount of memory, compared to a reference solver or a linear system resolution. The present applications focus mainly on a coupling between transport equation and 3D Stokes equations, for studying biological organisms motion in a highly viscous flows with variable viscosity.
Wilcox, Rand R
2013-04-01
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
van Houwelingen, Hans C; Putter, Hein
2015-04-01
By far the most popular model to obtain survival predictions for individual patients is the Cox model. The Cox model does not make any assumptions on the underlying hazard, but it relies heavily on the proportional hazards assumption. The most common ways to circumvent this robustness problem are 1) to categorize patients based on their prognostic risk score and to base predictions on Kaplan-Meier curves for the risk categories, or 2) to include interactions with the covariates and suitable functions of time. Robust estimators of the t(0)-year survival probabilities can also be obtained from a "stopped Cox" regression model, in which all observations are administratively censored at t(0). Other recent approaches to solve this robustness problem, originally proposed in the context of competing risks, are pseudo-values and direct binomial regression, based on unbiased estimating equations. In this paper stopped Cox regression is compared with these direct approaches. This is done by means of a simulation study to assess the biases of the different approaches and an analysis of breast cancer data to get some feeling for the performance in practice. The tentative conclusion is that stopped Cox and direct models agree well if the follow-up is not too long. There are larger differences for long-term follow-up data. There stopped Cox might be more efficient, but less robust.
Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua
2012-10-30
Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies.
Identifying gene-environment and gene-gene interactions using a progressive penalization approach.
Zhu, Ruoqing; Zhao, Hongyu; Ma, Shuangge
2014-05-01
In genomic studies, identifying important gene-environment and gene-gene interactions is a challenging problem. In this study, we adopt the statistical modeling approach, where interactions are represented by product terms in regression models. For the identification of important interactions, we adopt penalization, which has been used in many genomic studies. Straightforward application of penalization does not respect the "main effect, interaction" hierarchical structure. A few recently proposed methods respect this structure by applying constrained penalization. However, they demand very complicated computational algorithms and can only accommodate a small number of genomic measurements. We propose a computationally fast penalization method that can identify important gene-environment and gene-gene interactions and respect a strong hierarchical structure. The method takes a stagewise approach and progressively expands its optimization domain to account for possible hierarchical interactions. It is applicable to multiple data types and models. A coordinate descent method is utilized to produce the entire regularized solution path. Simulation study demonstrates the superior performance of the proposed method. We analyze a lung cancer prognosis study with gene expression measurements and identify important gene-environment interactions.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
A novel generalized ridge regression method for quantitative genetics.
Shen, Xia; Alam, Moudud; Fikse, Freddy; Rönnegård, Lars
2013-04-01
As the molecular marker density grows, there is a strong need in both genome-wide association studies and genomic selection to fit models with a large number of parameters. Here we present a computationally efficient generalized ridge regression (RR) algorithm for situations in which the number of parameters largely exceeds the number of observations. The computationally demanding parts of the method depend mainly on the number of observations and not the number of parameters. The algorithm was implemented in the R package bigRR based on the previously developed package hglm. Using such an approach, a heteroscedastic effects model (HEM) was also developed, implemented, and tested. The efficiency for different data sizes were evaluated via simulation. The method was tested for a bacteria-hypersensitive trait in a publicly available Arabidopsis data set including 84 inbred lines and 216,130 SNPs. The computation of all the SNP effects required <10 sec using a single 2.7-GHz core. The advantage in run time makes permutation test feasible for such a whole-genome model, so that a genome-wide significance threshold can be obtained. HEM was found to be more robust than ordinary RR (a.k.a. SNP-best linear unbiased prediction) in terms of QTL mapping, because SNP-specific shrinkage was applied instead of a common shrinkage. The proposed algorithm was also assessed for genomic evaluation and was shown to give better predictions than ordinary RR.
NASA Astrophysics Data System (ADS)
Haddad, Khaled; Rahman, Ataur
2012-04-01
SummaryIn this article, an approach using Bayesian Generalised Least Squares (BGLS) regression in a region-of-influence (ROI) framework is proposed for regional flood frequency analysis (RFFA) for ungauged catchments. Using the data from 399 catchments in eastern Australia, the BGLS-ROI is constructed to regionalise the flood quantiles (Quantile Regression Technique (QRT)) and the first three moments of the log-Pearson type 3 (LP3) distribution (Parameter Regression Technique (PRT)). This scheme firstly develops a fixed region model to select the best set of predictor variables for use in the subsequent regression analyses using an approach that minimises the model error variance while also satisfying a number of statistical selection criteria. The identified optimal regression equation is then used in the ROI experiment where the ROI is chosen for a site in question as the region that minimises the predictive uncertainty. To evaluate the overall performances of the quantiles estimated by the QRT and PRT, a one-at-a-time cross-validation procedure is applied. Results of the proposed method indicate that both the QRT and PRT in a BGLS-ROI framework lead to more accurate and reliable estimates of flood quantiles and moments of the LP3 distribution when compared to a fixed region approach. Also the BGLS-ROI can deal reasonably well with the heterogeneity in Australian catchments as evidenced by the regression diagnostics. Based on the evaluation statistics it was found that both BGLS-QRT and PRT-ROI perform similarly well, which suggests that the PRT is a viable alternative to QRT in RFFA. The RFFA methods developed in this paper is based on the database available in eastern Australia. It is expected that availability of a more comprehensive database (in terms of both quality and quantity) will further improve the predictive performance of both the fixed and ROI based RFFA methods presented in this study, which however needs to be investigated in future when such a
Kwak, Il-Youp; Moore, Candace R; Spalding, Edgar P; Broman, Karl W
2014-08-01
Most statistical methods for quantitative trait loci (QTL) mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While methods exist for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl.
Linear Regression in High Dimension and/or for Correlated Inputs
NASA Astrophysics Data System (ADS)
Jacques, J.; Fraix-Burnet, D.
2014-12-01
Ordinary least square is the common way to estimate linear regression models. When inputs are correlated or when they are too numerous, regression methods using derived inputs directions or shrinkage methods can be efficient alternatives. Methods using derived inputs directions build new uncorrelated variables as linear combination of the initial inputs, whereas shrinkage methods introduce regularization and variable selection by penalizing the usual least square criterion. Both kinds of methods are presented and illustrated thanks to the R software on an astronomical dataset.
Hypothesis Testing Using Factor Score Regression: A Comparison of Four Methods
ERIC Educational Resources Information Center
Devlieger, Ines; Mayer, Axel; Rosseel, Yves
2016-01-01
In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and…
The Robustness of Regression and Substitution by Mean Methods in Handling Missing Values.
ERIC Educational Resources Information Center
Kaiser, Javaid
There are times in survey research when missing values need to be estimated. The robustness of four variations of regression and substitution by mean methods was examined using a 3x3x4 factorial design. The regression variations included in the study were: (1) regression using a single best predictor; (2) two best predictors; (3) all available…
L1-Penalized N-way PLS for subset of electrodes selection in BCI experiments
NASA Astrophysics Data System (ADS)
Eliseyev, Andrey; Moro, Cecile; Faber, Jean; Wyss, Alexander; Torres, Napoleon; Mestais, Corinne; Benabid, Alim Louis; Aksenova, Tetiana
2012-08-01
Recently, the N-way partial least squares (NPLS) approach was reported as an effective tool for neuronal signal decoding and brain-computer interface (BCI) system calibration. This method simultaneously analyzes data in several domains. It combines the projection of a data tensor to a low dimensional space with linear regression. In this paper the L1-Penalized NPLS is proposed for sparse BCI system calibration, allowing uniting the projection technique with an effective selection of subset of features. The L1-Penalized NPLS was applied for the binary self-paced BCI system calibration, providing selection of electrodes subset. Our BCI system is designed for animal research, in particular for research in non-human primates.
Risk prediction with machine learning and regression methods.
Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben
2014-07-01
This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler.
Gaussian Process Regression Plus Method for Localization Reliability Improvement.
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-07-29
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions.
Gaussian Process Regression Plus Method for Localization Reliability Improvement
Liu, Kehan; Meng, Zhaopeng; Own, Chung-Ming
2016-01-01
Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions. PMID:27483276
Interaction Models for Functional Regression
USSET, JOSEPH; STAICU, ANA-MARIA; MAITY, ARNAB
2015-01-01
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data. PMID:26744549
Mikhal, Julia; Geurts, Bernard J
2013-12-01
A volume-penalizing immersed boundary method is presented for the simulation of laminar incompressible flow inside geometrically complex blood vessels in the human brain. We concentrate on cerebral aneurysms and compute flow in curved brain vessels with and without spherical aneurysm cavities attached. We approximate blood as an incompressible Newtonian fluid and simulate the flow with the use of a skew-symmetric finite-volume discretization and explicit time-stepping. A key element of the immersed boundary method is the so-called masking function. This is a binary function with which we identify at any location in the domain whether it is 'solid' or 'fluid', allowing to represent objects immersed in a Cartesian grid. We compare three definitions of the masking function for geometries that are non-aligned with the grid. In each case a 'staircase' representation is used in which a grid cell is either 'solid' or 'fluid'. Reliable findings are obtained with our immersed boundary method, even at fairly coarse meshes with about 16 grid cells across a velocity profile. The validation of the immersed boundary method is provided on the basis of classical Poiseuille flow in a cylindrical pipe. We obtain first order convergence for the velocity and the shear stress, reflecting the fact that in our approach the solid-fluid interface is localized with an accuracy on the order of a grid cell. Simulations for curved vessels and aneurysms are done for different flow regimes, characterized by different values of the Reynolds number (Re). The validation is performed for laminar flow at Re = 250, while the flow in more complex geometries is studied at Re = 100 and Re = 250, as suggested by physiological conditions pertaining to flow of blood in the circle of Willis.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.
How to use linear regression and correlation in quantitative method comparison studies.
Twomey, P J; Kroll, M H
2008-04-01
Linear regression methods try to determine the best linear relationship between data points while correlation coefficients assess the association (as opposed to agreement) between the two methods. Linear regression and correlation play an important part in the interpretation of quantitative method comparison studies. Their major strength is that they are widely known and as a result both are employed in the vast majority of method comparison studies. While previously performed by hand, the availability of statistical packages means that regression analysis is usually performed by software packages including MS Excel, with or without the software programe Analyze-it as well as by other software packages. Such techniques need to be employed in a way that compares the agreement between the two methods examined and more importantly, because we are dealing with individual patients, whether the degree of agreement is clinically acceptable. Despite their use for many years, there is a lot of ignorance about the validity as well as the pros and cons of linear regression and correlation techniques. This review article describes the types of linear regression and regression (parametric and non-parametric methods) and the necessary general and specific requirements. The selection of the type of regression depends on where one has been trained, the tradition of the laboratory and the availability of adequate software.
Broadband mode in proton-precession magnetometers with signal processing regression methods
NASA Astrophysics Data System (ADS)
Denisov, Alexey Y.; Sapunov, Vladimir A.; Rubinstein, Boris
2014-05-01
The choice of the signal processing method may improve characteristics of the measuring device. We consider the measurement error of signal processing regression methods for a quasi-harmonic signal generated in a frequency selective device. The results are applied to analyze the difference between the simple period meter processing and regression algorithms using measurement cycle signal data in proton-precession magnetometers. Dependences of the measurement error on the sensor quality factor and frequency of nuclear precession are obtained. It is shown that regression methods considerably widen the registration bandwidth and relax the requirements on the magnetometer hardware, and thus affect the optimization criteria of the registration system.
Volume penalization to model falling leaves
NASA Astrophysics Data System (ADS)
Kolomenskiy, Dmitry; Schneider, Kai
2007-11-01
Numerical modeling of solid bodies moving through viscous incompressible fluid is considered. The 2D Navier-Stokes equations, written in the vorticity-streamfunction formulation, are discretized using a Fourier pseudo-spectral scheme with adaptive time-stepping. Solid obstacles of arbitrary shape are taken into account using the volume penalization method. Time- dependent penalization is implemented, making the method capable of solving problems where the obstacle follows an arbitrary motion. Numerical simulations of falling leaves are performed, using the above model supplemented by the discretized ODEs describing the motion of a solid body subjected to external forces and moments. Various regimes of the free fall are explored, depending on the physical parameters and initial conditions. The influence of the Reynolds number on the transition between fluttering and tumbling is investigated, showing the stabilizing effect of viscosity.
Chen, Huaihou; Wang, Yuanjia; Paik, Myunghee Cho; Choi, H Alex
2013-10-01
Multilevel functional data is collected in many biomedical studies. For example, in a study of the effect of Nimodipine on patients with subarachnoid hemorrhage (SAH), patients underwent multiple 4-hour treatment cycles. Within each treatment cycle, subjects' vital signs were reported every 10 minutes. This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated generalized outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a variance estimator robust to misspecification of correlation structure. We investigate the large sample properties of the penalized spline GEE estimator with multilevel continuous data and show that the asymptotics falls into two categories. In the small knots scenario, the estimated mean function is asymptotically efficient when the true correlation function is used and the asymptotic bias does not depend on the working correlation matrix. In the large knots scenario, both the asymptotic bias and variance depend on the working correlation. We propose a new method to select the smoothing parameter for penalized spline GEE based on an estimate of the asymptotic mean squared error (MSE). We conduct extensive simulation studies to examine property of the proposed estimator under different correlation structures and sensitivity of the variance estimation to the choice
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
An NCME Instructional Module on Data Mining Methods for Classification and Regression
ERIC Educational Resources Information Center
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
ERIC Educational Resources Information Center
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
A comparison of several methods of solving nonlinear regression groundwater flow problems.
Cooley, R.L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems.-from Author
A Maximum Likelihood Method for Latent Class Regression Involving a Censored Dependent Variable.
ERIC Educational Resources Information Center
Jedidi, Kamel; And Others
1993-01-01
A method is proposed to simultaneously estimate regression functions and subject membership in "k" latent classes or groups given a censored dependent variable for a cross-section of subjects. Maximum likelihood estimates are obtained using an EM algorithm. The method is illustrated through a consumer psychology application. (SLD)
ERIC Educational Resources Information Center
And Others; Young, Forrest W.
1976-01-01
A method is discussed which extends canonical regression analysis to the situation where the variables may be measured as nominal, ordinal, or interval, and where they may be either continuous or discrete. The method, which is purely descriptive, uses an alternating least squares algorithm and is robust. Examples are provided. (Author/JKS)
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Multiple regression methods show great potential for rare variant association tests.
Xu, ChangJiang; Ladouceur, Martin; Dastani, Zari; Richards, J Brent; Ciampi, Antonio; Greenwood, Celia M T
2012-01-01
The investigation of associations between rare genetic variants and diseases or phenotypes has two goals. Firstly, the identification of which genes or genomic regions are associated, and secondly, discrimination of associated variants from background noise within each region. Over the last few years, many new methods have been developed which associate genomic regions with phenotypes. However, classical methods for high-dimensional data have received little attention. Here we investigate whether several classical statistical methods for high-dimensional data: ridge regression (RR), principal components regression (PCR), partial least squares regression (PLS), a sparse version of PLS (SPLS), and the LASSO are able to detect associations with rare genetic variants. These approaches have been extensively used in statistics to identify the true associations in data sets containing many predictor variables. Using genetic variants identified in three genes that were Sanger sequenced in 1998 individuals, we simulated continuous phenotypes under several different models, and we show that these feature selection and feature extraction methods can substantially outperform several popular methods for rare variant analysis. Furthermore, these approaches can identify which variants are contributing most to the model fit, and therefore both goals of rare variant analysis can be achieved simultaneously with the use of regression regularization methods. These methods are briefly illustrated with an analysis of adiponectin levels and variants in the ADIPOQ gene.
Factor Regression Analysis: A New Method for Weighting Predictors. Final Report.
ERIC Educational Resources Information Center
Curtis, Ervin W.
The optimum weighting of variables to predict a dependent-criterion variable is an important problem in nearly all of the social and natural sciences. Although the predominant method, multiple regression analysis (MR), yields optimum weights for the sample at hand, these weights are not generally optimum in the population from which the sample was…
Simulation of Experimental Parameters of RC Beams by Employing the Polynomial Regression Method
NASA Astrophysics Data System (ADS)
Sayin, B.; Sevgen, S.; Samli, R.
2016-07-01
A numerical model based on the method polynomial regression is developed to simulate the mechanical behavior of reinforced concrete beams strengthened with a carbon-fiber-reinforced polymer and subjected to four-point bending. The results obtained are in good agreement with data of laboratory tests.
Three Reasons Why Stepwise Regression Methods Should Not Be Used by Researchers.
ERIC Educational Resources Information Center
Welge, Patricia
L. A. Marascuilo and R. C. Serlin (1988) note that stepwise regression is a method used frequently in social science research. C. Huberty (1989) characterizes such applications as being "common". In support of this latter statement, a review of dissertations by B. Thompson (1988) demonstrated that dissertation students frequently use…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model.
Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming
2011-06-01
Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.
An Empirical Likelihood Method for Semiparametric Linear Regression with Right Censored Data
Fang, Kai-Tai; Li, Gang; Lu, Xuyang; Qin, Hong
2013-01-01
This paper develops a new empirical likelihood method for semiparametric linear regression with a completely unknown error distribution and right censored survival data. The method is based on the Buckley-James (1979) estimating equation. It inherits some appealing properties of the complete data empirical likelihood method. For example, it does not require variance estimation which is problematic for the Buckley-James estimator. We also extend our method to incorporate auxiliary information. We compare our method with the synthetic data empirical likelihood of Li and Wang (2003) using simulations. We also illustrate our method using Stanford heart transplantation data. PMID:23573169
Non-Concave Penalized Likelihood with NP-Dimensionality
Fan, Jianqing; Lv, Jinchi
2011-01-01
Penalized likelihood methods are fundamental to ultra-high dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of Non-Polynomial (NP) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a long-standing gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1-penalty, which is a convex function at the boundary of the class of folded-concave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis. PMID:22287795
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya
2003-01-01
The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic
[Criminalistic and penal problems with "dyadic deaths"].
Kaliszczak, Paweł; Kunz, Jerzy; Bolechała, Filip
2002-01-01
This paper is a supplement to the article "Medico legal problems of dyadic death" elaborated by the same authors. Recalling the cases presented there. It is also an attempt to present the basic criminalistic, penal and definitional problems of dyadic death called also postagressional suicide. Criminalistic problems of dyadic death were presented in view of widely known "rule of seven golden questions"--what?, where?, when?, how?, why?, what method? and who? Criminalistic analysis of cases makes some differences in conclusions but it seemed interesting to match both--criminalistc and forensic points of views to the presented material.
Phillips, Kirk T.; Street, W. Nick
2005-01-01
The purpose of this study is to determine the best prediction of heart failure outcomes, resulting from two methods -- standard epidemiologic analysis with logistic regression and knowledge discovery with supervised learning/data mining. Heart failure was chosen for this study as it exhibits higher prevalence and cost of treatment than most other hospitalized diseases. The prevalence of heart failure has exceeded 4 million cases in the U.S.. Findings of this study should be useful for the design of quality improvement initiatives, as particular aspects of patient comorbidity and treatment are found to be associated with mortality. This is also a proof of concept study, considering the feasibility of emerging health informatics methods of data mining in conjunction with or in lieu of traditional logistic regression methods of prediction. Findings may also support the design of decision support systems and quality improvement programming for other diseases. PMID:16779367
NASA Astrophysics Data System (ADS)
Zheng, Jun; Shao, Xinyu; Gao, Liang; Jiang, Ping; Qiu, Haobo
2015-06-01
Engineering design, especially for complex engineering systems, is usually a time-consuming process involving computation-intensive computer-based simulation and analysis methods. A difference mapping method using least square support vector regression is developed in this work, as a special metamodelling methodology that includes variable-fidelity data, to replace the computationally expensive computer codes. A general difference mapping framework is proposed where a surrogate base is first created, then the approximation is gained by a mapping the difference between the base and the real high-fidelity response surface. The least square support vector regression is adopted to accomplish the mapping. Two different sampling strategies, nested and non-nested design of experiments, are conducted to explore their respective effects on modelling accuracy. Different sample sizes and three approximation performance measures of accuracy are considered.
NASA Astrophysics Data System (ADS)
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
Wilcox, Rand R
2010-05-01
This paper considers the problem of estimating the overall strength of an association, including situations where there is curvature. The general strategy is to fit a robust regression line, or some type of smoother that allows curvature, and then use a robust analogue of explanatory power, say eta(2). When the regression surface is a plane, an estimate of eta(2) via the Theil-Sen estimator is found to perform well, relative to some other robust regression estimators, in terms of mean squared error and bias. When there is curvature, a generalization of a kernel estimator derived by Fan performs relatively well, but two alternative smoothers have certain practical advantages. When eta(2) is approximately equal to zero, estimation using smoothers has relatively high bias. A variation of eta(2) is suggested for dealing with this problem. Methods for testing H(0): eta(2)=0 are examined that are based in part on smoothers. Two methods are found that control Type I error probabilities reasonably well in simulations. Software for applying the more successful methods is provided.
Unification of regression-based methods for the analysis of natural selection.
Morrissey, Michael B; Sakrejda, Krzysztof
2013-07-01
Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares-based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.
Impact of regression methods on improved effects of soil structure on soil water retention estimates
NASA Astrophysics Data System (ADS)
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
An Efficient Elastic Net with Regression Coefficients Method for Variable Selection of Spectrum Data
Liu, Wenya; Li, Qi
2017-01-01
Using the spectrum data for quality prediction always suffers from noise and colinearity, so variable selection method plays an important role to deal with spectrum data. An efficient elastic net with regression coefficients method (Enet-BETA) is proposed to select the significant variables of the spectrum data in this paper. The proposed Enet-BETA method can not only select important variables to make the quality easy to interpret, but also can improve the stability and feasibility of the built model. Enet-BETA method is not prone to overfitting because of the reduction of redundant variables realized by elastic net method. Hypothesis testing is used to further simplify the model and provide a better insight into the nature of process. The experimental results prove that the proposed Enet-BETA method outperforms the other methods in terms of prediction performance and model interpretation. PMID:28152003
Liu, Wenya; Li, Qi
2017-01-01
Using the spectrum data for quality prediction always suffers from noise and colinearity, so variable selection method plays an important role to deal with spectrum data. An efficient elastic net with regression coefficients method (Enet-BETA) is proposed to select the significant variables of the spectrum data in this paper. The proposed Enet-BETA method can not only select important variables to make the quality easy to interpret, but also can improve the stability and feasibility of the built model. Enet-BETA method is not prone to overfitting because of the reduction of redundant variables realized by elastic net method. Hypothesis testing is used to further simplify the model and provide a better insight into the nature of process. The experimental results prove that the proposed Enet-BETA method outperforms the other methods in terms of prediction performance and model interpretation.
Comparison of regression methods for modeling intensive care length of stay.
Verburg, Ilona W M; de Keizer, Nicolette F; de Jonge, Evert; Peek, Niels
2014-01-01
Intensive care units (ICUs) are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS) could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE) in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2), root mean squared prediction error (RMSPE), mean absolute prediction error (MAPE) and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0) and a mean of 4.2 days (standard deviation 7.9). The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between -2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.
Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P
2016-01-01
OBJECTIVE Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. DESIGN Observational cohort study. SETTING Five hospitals, from November 2008 until January 2013. PATIENTS Hospitalized ward patients INTERVENTIONS None MEASUREMENTS AND MAIN RESULTS Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score (MEWS), a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC 0.80 [95% CI 0.80–0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC 0.77 vs 0.74; p<0.01), and all models were more accurate than the MEWS (AUC 0.70 [95% CI 0.70–0.70]). CONCLUSIONS In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards. PMID:26771782
Li, Min; Zhou, Tong; Song, Yanan
2016-07-01
A grain size characterization method based on energy attenuation coefficient spectrum and support vector regression (SVR) is proposed. First, the spectra of the first and second back-wall echoes are cut into several frequency bands to calculate the energy attenuation coefficient spectrum. Second, the frequency band that is sensitive to grain size variation is determined. Finally, a statistical model between the energy attenuation coefficient in the sensitive frequency band and average grain size is established through SVR. Experimental verification is conducted on austenitic stainless steel. The average relative error of the predicted grain size is 5.65%, which is better than that of conventional methods.
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Lin, Wan-Yu; Schaid, Daniel J
2009-04-01
Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
NASA Astrophysics Data System (ADS)
Abraham, Simon; Raisee, Mehrdad; Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
Penalized Likelihood for General Semi-Parametric Regression Models.
1985-05-01
should be stressed that q, while it may be somewhat less than n, will still be ’large’, and parametric estimation of £ will not be appropriate...Partial spline models for the semi- parametric estimation of functions of several variables, in Statistical Analysis of Time Series, Tokyo: Institute of
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil.
Structural break detection method based on the Adaptive Regression Splines technique
NASA Astrophysics Data System (ADS)
Kucharczyk, Daniel; Wyłomańska, Agnieszka; Zimroz, Radosław
2017-04-01
For many real data, long term observation consists of different processes that coexist or occur one after the other. Those processes very often exhibit different statistical properties and thus before the further analysis the observed data should be segmented. This problem one can find in different applications and therefore new segmentation techniques have been appeared in the literature during last years. In this paper we propose a new method of time series segmentation, i.e. extraction from the analysed vector of observations homogeneous parts with similar behaviour. This method is based on the absolute deviation about the median of the signal and is an extension of the previously proposed techniques also based on the simple statistics. In this paper we introduce the method of structural break point detection which is based on the Adaptive Regression Splines technique, one of the form of regression analysis. Moreover we propose also the statistical test which allows testing hypothesis of behaviour related to different regimes. First, the methodology we apply to the simulated signals with different distributions in order to show the effectiveness of the new technique. Next, in the application part we analyse the real data set that represents the vibration signal from a heavy duty crusher used in a mineral processing plant.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples.
Least squares regression methods for clustered ROC data with discrete covariates.
Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton
2016-07-01
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods.
NASA Astrophysics Data System (ADS)
Yun, Yuqi; Zevin, Michael; Sampson, Laura; Kalogera, Vassiliki
2017-01-01
With more observations from LIGO in the upcoming years, we will be able to construct an observed mass distribution of black holes to compare with binary evolution simulations. This will allow us to investigate the physics of binary evolution such as the effects of common envelope efficiency and wind strength, or the properties of the population such as the initial mass function.However, binary evolution codes become computationally expensive when running large populations of binaries over a multi-dimensional grid of input parameters, and may simulate accurately only for a limited combination of input parameter values. Therefore we developed a fast machine-learning method that utilizes Gaussian Mixture Model (GMM) and Gaussian Process (GP) regression, which together can predict distributions over the entire parameter space based on a limited number of simulated models. Furthermore, Gaussian Process regression naturally provides interpolation errors in addition to interpolation means, which could provide a means of targeting the most uncertain regions of parameter space for running further simulations.We also present a case study on applying this new method to predicting chirp mass distributions for binary black hole systems (BBHs) in Milky-way like galaxies of different metallicities.
Shi, Yinghuan; Gao, Yaozong; Liao, Shu; Zhang, Daoqiang
2015-01-01
In1 recent years, there has been a great interest in prostate segmentation, which is a important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician’s simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms: tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice. PMID:26752809
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
NASA Astrophysics Data System (ADS)
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
ℓ(1)-penalized linear mixed-effects models for high dimensional data with application to BCI.
Fazli, Siamac; Danóczy, Márton; Schelldorfer, Jürg; Müller, Klaus-Robert
2011-06-15
Recently, a novel statistical model has been proposed to estimate population effects and individual variability between subgroups simultaneously, by extending Lasso methods. We will for the first time apply this so-called ℓ(1)-penalized linear regression mixed-effects model for a large scale real world problem: we study a large set of brain computer interface data and through the novel estimator are able to obtain a subject-independent classifier that compares favorably with prior zero-training algorithms. This unifying model inherently compensates shifts in the input space attributed to the individuality of a subject. In particular we are now for the first time able to differentiate within-subject and between-subject variability. Thus a deeper understanding both of the underlying statistical and physiological structures of the data is gained.
Alternating iterative regression method for dead time estimation from experimental designs.
Pous-Torres, S; Torres-Lapasió, J R; Baeza-Baeza, J J; García-Alvarez-Coque, M C
2009-05-01
An indirect method for dead time (t (0)) estimation in reversed-phase liquid chromatography, based on a relationship between retention time and organic solvent content, is proposed. The method processes the retention data obtained in experimental designs. In order to get more general validity and enhance the accuracy, the information from several compounds is used altogether in an alternating regression fashion. The method was applied to nitrosamines, alkylbenzenes, phenols, benzene derivatives, polycyclic aromatic hydrocarbons and beta-blockers, among other compounds, chromatographed in a cyano and several C18 columns. A comprehensive validation was carried out by comparing the results with those provided by the injection of markers, the observation of the solvent front and the homologous series method. It was also found that different groups of compounds yielded the same t (0) value with the same column, which was verified in different solvent composition windows. The method allows improved models useful for optimisation or for other purposes, since t (0) can be estimated with the retention data of the target solutes.
Penalized feature selection and classification in bioinformatics
Huang, Jian
2008-01-01
In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data. PMID:18562478
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
Wang, Molin; Kuchiba, Aya; Ogino, Shuji
2015-01-01
In interdisciplinary biomedical, epidemiologic, and population research, it is increasingly necessary to consider pathogenesis and inherent heterogeneity of any given health condition and outcome. As the unique disease principle implies, no single biomarker can perfectly define disease subtypes. The complex nature of molecular pathology and biology necessitates biostatistical methodologies to simultaneously analyze multiple biomarkers and subtypes. To analyze and test for heterogeneity hypotheses across subtypes defined by multiple categorical and/or ordinal markers, we developed a meta-regression method that can utilize existing statistical software for mixed-model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by 1 marker while controlling for other markers and to evaluate whether the difference in exposure-subtype association across subtypes defined by 1 marker depends on any other markers. To illustrate this method in molecular pathological epidemiology research, we examined the associations between smoking status and colorectal cancer subtypes defined by 3 correlated tumor molecular characteristics (CpG island methylator phenotype, microsatellite instability, and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation) in the Nurses' Health Study (1980–2010) and the Health Professionals Follow-up Study (1986–2010). This method can be widely useful as molecular diagnostics and genomic technologies become routine in clinical medicine and public health. PMID:26116215
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Performance of robust regression methods in real-time polymerase chain reaction calibration.
Orenti, Annalisa; Marubini, Ettore
2014-12-09
The ordinary least squares (OLS) method is routinely used to estimate the unknown concentration of nucleic acids in a given solution by means of calibration. However, when outliers are present it could appear sensible to resort to robust regression methods. We analyzed data from an External Quality Control program concerning quantitative real-time PCR and we found that 24 laboratories out of 40 presented outliers, which occurred most frequently at the lowest concentrations. In this article we investigated and compared the performance of the OLS method, the least absolute deviation (LAD) method, and the biweight MM-estimator in real-time PCR calibration via a Monte Carlo simulation. Outliers were introduced by replacement contamination. When contamination was absent the coverages of OLS and MM-estimator intervals were acceptable and their widths small, whereas LAD intervals had acceptable coverages at the expense of higher widths. In the presence of contamination we observed a trade-off between width and coverage: the OLS performance got worse, the MM-estimator intervals widths remained short (but this was associated with a reduction in coverages), while LAD intervals widths were constantly larger with acceptable coverages at the nominal level.
Likelihood methods for regression models with expensive variables missing by design.
Zhao, Yang; Lawless, Jerald F; McLeish, Donald L
2009-02-01
In some applications involving regression the values of certain variables are missing by design for some individuals. For example, in two-stage studies (Zhao and Lipsitz, 1992), data on "cheaper" variables are collected on a random sample of individuals in stage I, and then "expensive" variables are measured for a subsample of these in stage II. So the "expensive" variables are missing by design at stage I. Both estimating function and likelihood methods have been proposed for cases where either covariates or responses are missing. We extend the semiparametric maximum likelihood (SPML) method for missing covariate problems (e.g. Chen, 2004; Ibrahim et al., 2005; Zhang and Rockette, 2005, 2007) to deal with more general cases where covariates and/or responses are missing by design, and show that profile likelihood ratio tests and interval estimation are easily implemented. Simulation studies are provided to examine the performance of the likelihood methods and to compare their efficiencies with estimating function methods for problems involving (a) a missing covariate and (b) a missing response variable. We illustrate the ease of implementation of SPML and demonstrate its high efficiency.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2016-02-03
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs.
Model building in nonproportional hazard regression.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
2013-12-30
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses.
Investigating the Accuracy of Three Estimation Methods for Regression Discontinuity Design
ERIC Educational Resources Information Center
Sun, Shuyan; Pan, Wei
2013-01-01
Regression discontinuity design is an alternative to randomized experiments to make causal inference when random assignment is not possible. This article first presents the formal identification and estimation of regression discontinuity treatment effects in the framework of Rubin's causal model, followed by a thorough literature review of…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
Eng, K.; Milly, P.C.D.; Tasker, Gary D.
2007-01-01
To facilitate estimation of streamflow characteristics at an ungauged site, hydrologists often define a region of influence containing gauged sites hydrologically similar to the estimation site. This region can be defined either in geographic space or in the space of the variables that are used to predict streamflow (predictor variables). These approaches are complementary, and a combination of the two may be superior to either. Here we propose a hybrid region-of-influence (HRoI) regression method that combines the two approaches. The new method was applied with streamflow records from 1,091 gauges in the southeastern United States to estimate the 50-year peak flow (Q50). The HRoI approach yielded lower root-mean-square estimation errors and produced fewer extreme errors than either the predictor-variable or geographic region-of-influence approaches. It is concluded, for Q50 in the study region, that similarity with respect to the basin characteristics considered (area, slope, and annual precipitation) is important, but incomplete, and that the consideration of geographic proximity of stations provides a useful surrogate for characteristics that are not included in the analysis. ?? 2007 ASCE.
A faster optimization method based on support vector regression for aerodynamic problems
NASA Astrophysics Data System (ADS)
Yang, Xixiang; Zhang, Weihua
2013-09-01
In this paper, a new strategy for optimal design of complex aerodynamic configuration with a reasonable low computational effort is proposed. In order to solve the formulated aerodynamic optimization problem with heavy computation complexity, two steps are taken: (1) a sequential approximation method based on support vector regression (SVR) and hybrid cross validation strategy, is proposed to predict aerodynamic coefficients, and thus approximates the objective function and constraint conditions of the originally formulated optimization problem with given limited sample points; (2) a sequential optimization algorithm is proposed to ensure the obtained optimal solution by solving the approximation optimization problem in step (1) is very close to the optimal solution of the originally formulated optimization problem. In the end, we adopt a complex aerodynamic design problem, that is optimal aerodynamic design of a flight vehicle with grid fins, to demonstrate our proposed optimization methods, and numerical results show that better results can be obtained with a significantly lower computational effort than using classical optimization techniques.
NASA Astrophysics Data System (ADS)
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ≈90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
Adaptive wavelet simulation of global ocean dynamics using a new Brinkman volume penalization
NASA Astrophysics Data System (ADS)
Kevlahan, N. K.-R.; Dubos, T.; Aechtner, M.
2015-12-01
In order to easily enforce solid-wall boundary conditions in the presence of complex coastlines, we propose a new mass and energy conserving Brinkman penalization for the rotating shallow water equations. This penalization does not lead to higher wave speeds in the solid region. The error estimates for the penalization are derived analytically and verified numerically for linearized one-dimensional equations. The penalization is implemented in a conservative dynamically adaptive wavelet method for the rotating shallow water equations on the sphere with bathymetry and coastline data from NOAA's ETOPO1 database. This code could form the dynamical core for a future global ocean model. The potential of the dynamically adaptive ocean model is illustrated by using it to simulate the 2004 Indonesian tsunami and wind-driven gyres.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil
2015-08-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub.
NASA Astrophysics Data System (ADS)
Reddy, K. S.; Somasundharam, S.
2016-09-01
In this work, inverse heat conduction problem (IHCP) involving the simultaneous estimation of principal thermal conductivities (kxx,kyy,kzz ) and specific heat capacity of orthotropic materials is solved by using surrogate forward model. Uniformly distributed random samples for each unknown parameter is generated from the prior knowledge about these parameters and Finite Volume Method (FVM) is employed to solve the forward problem for temperature distribution with space and time. A supervised machine learning technique- Gaussian Process Regression (GPR) is used to construct the surrogate forward model with the available temperature solution and randomly generated unknown parameter data. The statistical and machine learning toolbox available in MATLAB R2015b is used for this purpose. The robustness of the surrogate model constructed using GPR is examined by carrying out the parameter estimation for 100 new randomly generated test samples at a measurement error of ±0.3K. The temperature measurement is obtained by adding random noise with the mean at zero and known standard deviation (σ = 0.1) to the FVM solution of the forward problem. The test results show that Mean Percentage Deviation (MPD) of all test samples for all parameters is < 10%.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326
Race Making in a Penal Institution.
Walker, Michael L
2016-01-01
This article provides a ground-level investigation into the lives of penal inmates, linking the literature on race making and penal management to provide an understanding of racial formation processes in a modern penal institution. Drawing on 135 days of ethnographic data collected as an inmate in a Southern California county jail system, the author argues that inmates are subjected to two mutually constitutive racial projects--one institutional and the other microinteractional. Operating in symbiosis within a narrative of risk management, these racial projects increase (rather than decrease) incidents of intraracial violence and the potential for interracial violence. These findings have implications for understanding the process of racialization and evaluating the effectiveness of penal management strategies.
Eliseyev, Andrey; Aksenova, Tetiana
2016-01-01
In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Domain selection for the varying coefficient model via local polynomial regression
Kong, Dehan; Bondell, Howard; Wu, Yichao
2014-01-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example. PMID:25506112
Revisiting the Distance Duality Relation using a non-parametric regression method
NASA Astrophysics Data System (ADS)
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
A primer on regression methods for decoding cis-regulatory logic
Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.
2009-03-03
The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
ERIC Educational Resources Information Center
Ferrer, Alvaro J. Arce; Wang, Lin
This study compared the classification performance among parametric discriminant analysis, nonparametric discriminant analysis, and logistic regression in a two-group classification application. Field data from an organizational survey were analyzed and bootstrapped for additional exploration. The data were observed to depart from multivariate…
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…
Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method
ERIC Educational Resources Information Center
Rutkowski, Leslie; Zhou, Yan
2015-01-01
Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
NASA Astrophysics Data System (ADS)
Ji, Yanju; Huang, Wanyu; Yu, Mingmei; Guan, Shanshan; Wang, Yuan; Zhu, Yu
2017-01-01
This article studies full-waveform associated identification method of airborne time-domain electromagnetic method (ATEM) 3-d anomalies based on multiple linear regression analysis method. By using convolution algorithm, full-waveform theoretical responses are computed to derive sample library including switch-off-time period responses and off-time period responses. Extract full-waveform attributes from theoretical responses to derive linear regression equations which are used to identify the geological parameters. In order to improve the precision ulteriorly, we optimize the identification method by separating the sample library into different groups and identify the parameter respectively. Performance of full-waveform associated identification method with field data of wire-loop test experiments with ATEM system in Daedao of Changchun proves that the full-waveform associated identification method is feasible practically.
Technology Transfer Automated Retrieval System (TEKTRAN)
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models.
Elliott, Michael R
2009-03-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create "data driven" weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
A method to determine the necessity for global signal regression in resting-state fMRI studies.
Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang
2012-12-01
In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures.
Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2012-01-01
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092
Goodenough, Anne E.; Hart, Adam G.; Stafford, Richard
2012-01-01
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines. PMID:22479605
Goodenough, Anne E; Hart, Adam G; Stafford, Richard
2012-01-01
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.
Patching rainfall data using regression methods. 3. Grouping, patching and outlier detection
NASA Astrophysics Data System (ADS)
Pegram, Geoffrey
1997-11-01
Rainfall data are used, amongst other things, for augmenting or repairing streamflow records in a water resources analysis environment. Gaps in rainfall records cause problems in the construction of water-balance models using monthly time-steps, when it becomes necessary to estimate missing values. Modest extensions are sometimes also desirable. It is also important to identify outliers as possible erroneous data and to group data which are hydrologically similar in order to accomplish good patching. Algorithms are described which accomplish these tasks using the covariance biplot, multiple linear regression, singular value decomposition and the pseudo-Expectation-Maximization algorithm.
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption.
Quirós, Elia; Felicísimo, Ángel M.; Cuartero, Aurora
2009-01-01
This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test. PMID:22291550
Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V
2014-11-10
We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting.
NASA Astrophysics Data System (ADS)
Katpatal, Y. B.; Paranjpe, S. V.; Kadu, M.
2014-12-01
Effective Watershed management requires authentic data of surface runoff potential for which several methods and models are in use. Generally, non availability of field data calls for techniques based on remote observations. Soil Conservation Services Curve Number (SCS CN) method is an important method which utilizes information generated from remote sensing for estimation of runoff. Several attempts have been made to validate the runoff values generated from SCS CN method by comparing the results obtained from other methods. In the present study, runoff estimation through SCS CN method has been performed using IRS LISS IV data for the Venna Basin situated in the Central India. The field data was available for Venna Basin. The Land use/land cover and soil layers have been generated for the entire watershed using the satellite data and Geographic Information System (GIS). The Venna basin have been divided into intercepted catchment and free catchment. Run off values have been estimated using field data through regression analysis. The runoff values estimated using SCS CN method have been compared with yield values generated using data collected from the tank gauge stations and data from the discharge stations. The correlation helps in validation of the results obtained from the SCS CN method and its applicability in Indian conditions. Key Words: SCS CN Method, Regression Analysis, Land Use / Land cover, Runoff, Remote Sensing, GIS.
NASA Astrophysics Data System (ADS)
Cheng, Anyu; Jiang, Xiao; Li, Yongfu; Zhang, Chao; Zhu, Hao
2017-01-01
This study proposes a multiple sources and multiple measures based traffic flow prediction algorithm using the chaos theory and support vector regression method. In particular, first, the chaotic characteristics of traffic flow associated with the speed, occupancy, and flow are identified using the maximum Lyapunov exponent. Then, the phase space of multiple measures chaotic time series are reconstructed based on the phase space reconstruction theory and fused into a same multi-dimensional phase space using the Bayesian estimation theory. In addition, the support vector regression (SVR) model is designed to predict the traffic flow. Numerical experiments are performed using the data from multiple sources. The results show that, compared with the single measure, the proposed method has better performance for the short-term traffic flow prediction in terms of the accuracy and timeliness.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Environmental Conditions in Kentucky's Penal Institutions
ERIC Educational Resources Information Center
Bell, Irving
1974-01-01
A state task force was organized to identify health or environmental deficiencies existing in Kentucky penal institutions. Based on information gained through direct observation and inmate questionnaires, the task force concluded that many hazardous and unsanitary conditions existed, and recommended that immediate action be given to these…
NCAA Penalizes Fewer Teams than Expected
ERIC Educational Resources Information Center
Sander, Libby
2008-01-01
This article reports that the National Collegiate Athletic Association (NCAA) has penalized fewer teams than it expected this year over athletes' poor academic performance. For years, officials with the NCAA have predicted that strikingly high numbers of college sports teams could be at risk of losing scholarships this year because of their…
Rezaei, B; Khayamian, T; Mokhtari, A
2009-02-20
A flow injection chemiluminescent (FI-CL) method has been developed for the simultaneous determination of codeine and noscapine using N-PLS regression. The method is based on the fact that kinetic characteristics of codeine and noscapine are different in the Ru(phen)(3)(2+)-Ce(IV) CL system. In flow injection mode, codeine gives broad peak with the highest CL intensity at 4.4s, whereas the maximum CL intensity of the noscapine appears at about 2.6s. Moreover, the effect of increasing H(2)SO(4) concentration was different on the CL intensity of the compounds. An experimental design, central composite design (CCD), was used to realize the optimized variables such as Ru(II) and Ce(IV) concentrations for the both compounds. At the optimized condition, a three-way data structure (samples, H(2)SO(4) concentration, time) was constructed and followed by N-PLS regression. The number of factors for the N-PLS regression was selected based on the minimum values for the root mean squared error of cross validation (RMSECV). The proposed method is applied to the simultaneous quantification of codeine and noscapine in the pharmaceutical preparations.
NASA Astrophysics Data System (ADS)
Tan, F.; Lim, H. S.; Abdullah, K.; Yoon, T. L.; Zubir Matjafri, M.; Holben, B.
2014-02-01
Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global.
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.
1998-01-01
A key challenge in designing the new High Speed Civil Transport (HSCT) aircraft is determining a good match between the airframe and engine. Multidisciplinary design optimization can be used to solve the problem by adjusting parameters of both the engine and the airframe. Earlier, an example problem was presented of an HSCT aircraft with four mixed-flow turbofan engines and a baseline mission to carry 305 passengers 5000 nautical miles at a cruise speed of Mach 2.4. The problem was solved by coupling NASA Lewis Research Center's design optimization testbed (COMETBOARDS) with NASA Langley Research Center's Flight Optimization System (FLOPS). The computing time expended in solving the problem was substantial, and the instability of the FLOPS analyzer at certain design points caused difficulties. In an attempt to alleviate both of these limitations, we explored the use of two approximation concepts in the design optimization process. The two concepts, which are based on neural network and linear regression approximation, provide the reanalysis capability and design sensitivity analysis information required for the optimization process. The HSCT aircraft optimization problem was solved by using three alternate approaches; that is, the original FLOPS analyzer and two approximate (derived) analyzers. The approximate analyzers were calibrated and used in three different ranges of the design variables; narrow (interpolated), standard, and wide (extrapolated).
Li, Yanming; Zhu, Ji
2015-01-01
Summary We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functioning groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. PMID:25732839
NASA Astrophysics Data System (ADS)
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
Du, Hongying; Hu, Zhide; Bazzoli, Andrea; Zhang, Yang
2011-01-01
The epidermal growth factor receptor (EGFR) protein tyrosine kinase (PTK) is an important protein target for anti-tumor drug discovery. To identify potential EGFR inhibitors, we conducted a quantitative structure–activity relationship (QSAR) study on the inhibitory activity of a series of quinazoline derivatives against EGFR tyrosine kinase. Two 2D-QSAR models were developed based on the best multi-linear regression (BMLR) and grid-search assisted projection pursuit regression (GS-PPR) methods. The results demonstrate that the inhibitory activity of quinazoline derivatives is strongly correlated with their polarizability, activation energy, mass distribution, connectivity, and branching information. Although the present investigation focused on EGFR, the approach provides a general avenue in the structure-based drug development of different protein receptor inhibitors. PMID:21811593
Chiu, Chuan-Hung; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-01-01
Understanding the spatial characteristics of dengue fever (DF) incidences is crucial for governmental agencies to implement effective disease control strategies. We investigated the associations between environmental and socioeconomic factors and DF geographic distribution, are proposed a probabilistic risk assessment approach that uses threshold-based quantile regression to identify the significant risk factors for DF transmission and estimate the spatial distribution of DF risk regarding full probability distributions. To interpret risk, return period was also included to characterize the frequency pattern of DF geographic occurrences. The study area included old Kaohsiung City and Fongshan District, two areas in Taiwan that have been affected by severe DF infections in recent decades. Results indicated that water-related facilities, including canals and ditches, and various types of residential area, as well as the interactions between them, were significant factors that elevated DF risk. By contrast, the increase of per capita income and its associated interactions with residential areas mitigated the DF risk in the study area. Nonlinear associations between these factors and DF risk were present in various quantiles, implying that water-related factors characterized the underlying spatial patterns of DF, and high-density residential areas indicated the potential for high DF incidence (e.g., clustered infections). The spatial distributions of DF risks were assessed in terms of three distinct map presentations: expected incidence rates, incidence rates in various return periods, and return periods at distinct incidence rates. These probability-based spatial risk maps exhibited distinct DF risks associated with environmental factors, expressed as various DF magnitudes and occurrence probabilities across Kaohsiung, and can serve as a reference for local governmental agencies.
ERIC Educational Resources Information Center
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
Deng, Zhaohong; Choi, Kup-Sze; Jiang, Yizhang; Wang, Shitong
2014-12-01
Inductive transfer learning has attracted increasing attention for the training of effective model in the target domain by leveraging the information in the source domain. However, most transfer learning methods are developed for a specific model, such as the commonly used support vector machine, which makes the methods applicable only to the adopted models. In this regard, the generalized hidden-mapping ridge regression (GHRR) method is introduced in order to train various types of classical intelligence models, including neural networks, fuzzy logical systems and kernel methods. Furthermore, the knowledge-leverage based transfer learning mechanism is integrated with GHRR to realize the inductive transfer learning method called transfer GHRR (TGHRR). Since the information from the induced knowledge is much clearer and more concise than that from the data in the source domain, it is more convenient to control and balance the similarity and difference of data distributions between the source and target domains. The proposed GHRR and TGHRR algorithms have been evaluated experimentally by performing regression and classification on synthetic and real world datasets. The results demonstrate that the performance of TGHRR is competitive with or even superior to existing state-of-the-art inductive transfer learning algorithms.
Huang, Lei
2015-09-30
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required.
Sparse Regression by Projection and Sparse Discriminant Analysis.
Qi, Xin; Luo, Ruiyan; Carroll, Raymond J; Zhao, Hongyu
2015-04-01
Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared to the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplemental materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
Stojić, Andreja; Maletić, Dimitrije; Stanišić Stojić, Svetlana; Mijić, Zoran; Šoštarić, Andrej
2015-07-15
In this study, advanced multivariate methods were applied for VOC source apportionment and subsequent short-term forecast of industrial- and vehicle exhaust-related contributions in Belgrade urban area (Serbia). The VOC concentrations were measured using PTR-MS, together with inorganic gaseous pollutants (NOx, NO, NO2, SO2, and CO), PM10, and meteorological parameters. US EPA Positive Matrix Factorization and Unmix receptor models were applied to the obtained dataset both resolving six source profiles. For the purpose of forecasting industrial- and vehicle exhaust-related source contributions, different multivariate methods were employed in two separate cases, relying on meteorological data, and on meteorological data and concentrations of inorganic gaseous pollutants, respectively. The results indicate that Boosted Decision Trees and Multi-Layer Perceptrons were the best performing methods. According to the results, forecasting accuracy was high (lowest relative error of only 6%), in particular when the forecast was based on both meteorological parameters and concentrations of inorganic gaseous pollutants.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Technology Transfer Automated Retrieval System (TEKTRAN)
The beard testing method for measuring cotton fiber length is based on the fibrogram theory. However, in the instrumental implementations, the engineering complexity alters the original fiber length distribution observed by the instrument. This causes challenges in obtaining the entire original le...
2014-01-01
Background Chronic kidney disease (CKD) is a progressive and usually irreversible disease. Different types of outcomes are of interest in the course of CKD such as time-to-dialysis, transplantation or decline of the glomerular filtration rate (GFR). Statistical analyses aiming at investigating the association between these outcomes and risk factors raise a number of methodological issues. The objective of this study was to give an overview of these issues and to highlight some statistical methods that can address these topics. Methods A literature review of statistical methods published between 2002 and 2012 to investigate risk factors of CKD outcomes was conducted within the Scopus database. The results of the review were used to identify important methodological issues as well as to discuss solutions for each type of CKD outcome. Results Three hundred and four papers were selected. Time-to-event outcomes were more often investigated than quantitative outcome variables measuring kidney function over time. The most frequently investigated events in survival analyses were all-cause death, initiation of kidney replacement therapy, and progression to a specific value of GFR. While competing risks were commonly accounted for, interval censoring was rarely acknowledged when appropriate despite existing methods. When the outcome of interest was the quantitative decline of kidney function over time, standard linear models focussing on the slope of GFR over time were almost as often used as linear mixed models which allow various numbers of repeated measurements of kidney function per patient. Informative dropout was accounted for in some of these longitudinal analyses. Conclusions This study provides a broad overview of the statistical methods used in the last ten years for investigating risk factors of CKD progression, as well as a discussion of their limitations. Some existing potential alternatives that have been proposed in the context of CKD or in other contexts are
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
Outlier detection method in linear regression based on sum of arithmetic progression.
Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R < 2/n implies that the minimum is an outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e - 2 to ±1.0e + 2 from the correct value.
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface
Huang, Hai-Hui; Liu, Xiao-Ying; Liang, Yong
2016-01-01
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods. PMID:27136190
Lin, Wei; Feng, Rui; Li, Hongzhe
2014-01-01
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
Joo, Jong Wha J; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Hormozdiari, Farhad; Lusis, Aldons J; Eskin, Eleazar
2016-12-01
A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.
Rival-model penalized self-organizing map.
Cheung, Yiu-ming; Law, Lap-tak
2007-01-01
As a typical data visualization technique, self-organizing map (SOM) has been extensively applied to data clustering, image analysis, dimension reduction, and so forth. In a conventional adaptive SOM, it needs to choose an appropriate learning rate whose value is monotonically reduced over time to ensure the convergence of the map, meanwhile being kept large enough so that the map is able to gradually learn the data topology. Otherwise, the SOM's performance may seriously deteriorate. In general, it is nontrivial to choose an appropriate monotonically decreasing function for such a learning rate. In this letter, we therefore propose a novel rival-model penalized self-organizing map (RPSOM) learning algorithm that, for each input, adaptively chooses several rivals of the best-matching unit (BMU) and penalizes their associated models, i.e., those parametric real vectors with the same dimension as the input vectors, a little far away from the input. Compared to the existing methods, this RPSOM utilizes a constant learning rate to circumvent the awkward selection of a monotonically decreased function for the learning rate, but still reaches a robust result. The numerical experiments have shown the efficacy of our algorithm.
Penalized maximum-likelihood image reconstruction for lesion detection
NASA Astrophysics Data System (ADS)
Qi, Jinyi; Huesman, Ronald H.
2006-08-01
Detecting cancerous lesions is one major application in emission tomography. In this paper, we study penalized maximum-likelihood image reconstruction for this important clinical task. Compared to analytical reconstruction methods, statistical approaches can improve the image quality by accurately modelling the photon detection process and measurement noise in imaging systems. To explore the full potential of penalized maximum-likelihood image reconstruction for lesion detection, we derived simplified theoretical expressions that allow fast evaluation of the detectability of a random lesion. The theoretical results are used to design the regularization parameters to improve lesion detectability. We conducted computer-based Monte Carlo simulations to compare the proposed penalty function, conventional penalty function, and a penalty function for isotropic point spread function. The lesion detectability is measured by a channelized Hotelling observer. The results show that the proposed penalty function outperforms the other penalty functions for lesion detection. The relative improvement is dependent on the size of the lesion. However, we found that the penalty function optimized for a 5 mm lesion still outperforms the other two penalty functions for detecting a 14 mm lesion. Therefore, it is feasible to use the penalty function designed for small lesions in image reconstruction, because detection of large lesions is relatively easy.
Strong, Mark; Oakley, Jeremy E; Brennan, Alan; Breeze, Penny
2015-07-01
Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method.
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Turati, Pietro; Zio, Enrico
2015-08-01
In this work, we propose a modification of the traditional Auto Associative Kernel Regression (AAKR) method which enhances the signal reconstruction robustness, i.e., the capability of reconstructing abnormal signals to the values expected in normal conditions. The modification is based on the definition of a new procedure for the computation of the similarity between the present measurements and the historical patterns used to perform the signal reconstructions. The underlying conjecture for this is that malfunctions causing variations of a small number of signals are more frequent than those causing variations of a large number of signals. The proposed method has been applied to real normal condition data collected in an industrial plant for energy production. Its performance has been verified considering synthetic and real malfunctioning. The obtained results show an improvement in the early detection of abnormal conditions and the correct identification of the signals responsible of triggering the detection.
Li, Y.; Graubard, B. I.; Huang, P.; Gastwirth, J. L.
2015-01-01
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters–Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance–covariance estimator that is based on the Taylor linearization variance–covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999–2004, are conducted. Empirical results indicate that the Taylor linearization variance–covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
Stride, P
2011-09-01
Robert Garrett emigrated from Scotland to Van Diemen's Land (now Tasmania) in 1822. Within a few months of arrival he was posted to the barbaric penal colony in Macquarie Harbour, known as Sarah Island. His descent into alcoholism, medical misadventure and premature death were related to his largely unsupported professional environment and were, in many respects, typical of those subjected to this experience.
Mortality study of nickel-cadmium battery workers by the method of regression models in life tables.
Sorahan, T; Waterhouse, J A
1983-01-01
The mortality experienced by a cohort of 3025 nickel-cadmium battery workers during the period 1946-81 has been investigated. Occupational histories were described in terms of some 75 jobs: eight with "high", 14 with "moderate" or slight, and 53 with minimal exposure to cadmium oxide (hydroxide). The method of regression models in life tables (RMLT) was used to compare the estimated cadmium exposures (durations of exposed employment) of those dying from causes of interest with those of matching survivors in the same year of follow up, while controlling for sex, for year and age of starting employment, and for duration of employment. No new evidence of any association between occupational exposure to cadmium oxide (hydroxide) and cancer of the prostate was found. PMID:6871118
Asadpour-Zeynali, Karim; Soheili-Azad, Payam
2012-01-01
A differential pulse polarography (DPP) for the simultaneous determination of 2-nitrophenol and 4-nitrophenol was proposed. It was found that under optimum experimental conditions (pH = 5, scan rate = 5 mV/s, pulse amplitude = -50 mV), 2-nitrophenol and 4-nitrophenol had well-defined polarographic reduction waves with peak potentials at -317 and -406 mV, respectively. In the mixture of two compounds overlapping polarographic peaks were observed. In this study, support vector regression (SVR) was applied to resolve the overlapped polarograms. Furthermore, a comparison was made between the performance of SVR and partial least square (PLS) on data set. The results demonstrated that SVR is a better well-performing alternative for the analysis and modeling of DPP data than the commonly applied PLS technique. The proposed method was used for the determination of 2-nitrophenol and 4-nitrophenol in industrial waste water.
Risser, Dennis W.; Thompson, Ronald E.; Stuckey, Marla H.
2008-01-01
A method was developed for making estimates of long-term, mean annual ground-water recharge from streamflow data at 80 streamflow-gaging stations in Pennsylvania. The method relates mean annual base-flow yield derived from the streamflow data (as a proxy for recharge) to the climatic, geologic, hydrologic, and physiographic characteristics of the basins (basin characteristics) by use of a regression equation. Base-flow yield is the base flow of a stream divided by the drainage area of the basin, expressed in inches of water basinwide. Mean annual base-flow yield was computed for the period of available streamflow record at continuous streamflow-gaging stations by use of the computer program PART, which separates base flow from direct runoff on the streamflow hydrograph. Base flow provides a reasonable estimate of recharge for basins where streamflow is mostly unaffected by upstream regulation, diversion, or mining. Twenty-eight basin characteristics were included in the exploratory regression analysis as possible predictors of base-flow yield. Basin characteristics found to be statistically significant predictors of mean annual base-flow yield during 1971-2000 at the 95-percent confidence level were (1) mean annual precipitation, (2) average maximum daily temperature, (3) percentage of sand in the soil, (4) percentage of carbonate bedrock in the basin, and (5) stream channel slope. The equation for predicting recharge was developed using ordinary least-squares regression. The standard error of prediction for the equation on log-transformed data was 9.7 percent, and the coefficient of determination was 0.80. The equation can be used to predict long-term, mean annual recharge rates for ungaged basins, providing that the explanatory basin characteristics can be determined and that the underlying assumption is accepted that base-flow yield derived from PART is a reasonable estimate of ground-water recharge rates. For example, application of the equation for 370
[Legal probation of juvenile offenders after release from penal reformative training].
Urbaniok, Frank; Rossegger, Astrid; Fegert, Jörg; Rubertus, Michael; Endrass, Jérôme
2007-01-01
Over recent years, there has been an increase in adolescent delinquency in Germany and Switzerland. In this context, the episodic character of the majority of adolescent delinquency is usually pointed out; however, numerous studies show high re-offending rates for released adolescents. The goal of this study is to examine the legal probation of juvenile delinquents after release from penal reformative training. In this study, the legal probation of adolescents committed to the AEA Uitikon, in the Canton of Zurich, between 1974 and 1986 was scrutinized by examining extracts from their criminal record as of 2003. The period of catamnesis was thus between 17 and 29 years. Overall, 71% of offenders reoffended, 29% with a violent or sexual offence. Bivariate logistic regression showed that the kind of offence committed had no influence on the probability of recidivism. If commitment to the AEA was due to a single offence (as opposed to serial offences), the risk of recidivism was reduced by 71% (OR=0.29). The results of the study show that young delinquents sentenced and committed to penal reformative training have a high recidivism risk. Furthermore, the results point out the importance of the evaluation of the offense-preventive efficacy of penal measures.
[Analysis of selected changes in project the penal code].
Berent, Jarosław; Jurczyk, Agnieszka P; Szram, Stefan
2002-01-01
In this paper the authors have analysed selected proposals of changes in the project of amendments in the penal code. Special attention has been placed on problem of the legality of the "comma" in art. 156 of the penal code. In this matter also a review of court jurisdiction has been made.
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel
The cross politics of Ecuador's penal state.
Garces, Chris
2010-01-01
This essay examines inmate "crucifixion protests" in Ecuador's largest prison during 2003-04. It shows how the preventively incarcerated-of whom there are thousands-managed to effectively denounce their extralegal confinement by embodying the violence of the Christian crucifixion story. This form of protest, I argue, simultaneously clarified and obscured the multiple layers of sovereign power that pressed down on urban crime suspects, who found themselves persecuted and forsaken both outside and within the space of the prison. Police enacting zero-tolerance policies in urban neighborhoods are thus a key part of the penal state, as are the politically threatened family members of the indicted, the sensationalized local media, distrustful neighbors, prison guards, and incarcerated mafia. The essay shows how the politico-theological performance of self-crucifixion responded to these internested forms of sovereign violence, and were briefly effective. The inmates' cross intervention hence provides a window into the way sovereignty works in the Ecuadorean penal state, drawing out how incarceration trends and new urban security measures interlink, and produce an array of victims.
Xu, A; Zhang, Y; Ran, T; Liu, H; Lu, S; Xu, J; Xiong, X; Jiang, Y; Lu, T; Chen, Y
2015-01-01
Bruton's tyrosine kinase (BTK) plays a crucial role in B-cell activation and development, and has emerged as a new molecular target for the treatment of autoimmune diseases and B-cell malignancies. In this study, two- and three-dimensional quantitative structure-activity relationship (2D and 3D-QSAR) analyses were performed on a series of pyridine and pyrimidine-based BTK inhibitors by means of genetic algorithm optimized multivariate adaptive regression spline (GA-MARS) and comparative molecular similarity index analysis (CoMSIA) methods. Here, we propose a modified MARS algorithm to develop 2D-QSAR models. The top ranked models showed satisfactory statistical results (2D-QSAR: Q(2) = 0.884, r(2) = 0.929, r(2)pred = 0.878; 3D-QSAR: q(2) = 0.616, r(2) = 0.987, r(2)pred = 0.905). Key descriptors selected by 2D-QSAR were in good agreement with the conclusions of 3D-QSAR, and the 3D-CoMSIA contour maps facilitated interpretation of the structure-activity relationship. A new molecular database was generated by molecular fragment replacement (MFR) and further evaluated with GA-MARS and CoMSIA prediction. Twenty-five pyridine and pyrimidine derivatives as novel potential BTK inhibitors were finally selected for further study. These results also demonstrated that our method can be a very efficient tool for the discovery of novel potent BTK inhibitors.
Wang, Ji-Hua; Huang, Wen-Jiang; Lao, Cai-Lian; Zhang, Lu-Da; Luo, Chang-Bing; Wang, Tao; Liu, Liang-Yun; Song, Xiao-Yu; Ma, Zhi-Hong
2007-07-01
With the widespread application of remote sensing (RS) in agriculture, monitoring and prediction of crop nutrition condition attracts attention of many scientists. Foliar nitrogen content (N) is one of the most important nutrients for plant growth, and vertical leaf N gradient is an important indicator of crop nutrition situation. Investigations have been made on N vertical distribution to describe the growth status of winter wheat. Results indicate that from the canopy top to the ground surface, N shows an obvious gradient decreasing trend. The objective of this study was to discuss the inversion method of N vertical distribution with canopy reflected spectrum by the partial least squares regression (PLS) method. PLS was selected for the inversion of upper, middle and lower layers of N. To improve the accuracy of prediction, the N in the upper layer as well as in the middle and bottom layers should be taken into consideration when crop nutrition condition is appraised by RS data. The established models by the observed data in year 2001-2002 were validated by the data in year 2003-2004. The inversion precision and error were acceptable. It provided a theoretic basis for widely and non-damaged variable rate nitrogen application of winter wheat by canopy reflected spectrum.
Calculating a Stepwise Ridge Regression.
ERIC Educational Resources Information Center
Morris, John D.
1986-01-01
Although methods for using ordinary least squares regression computer programs to calculate a ridge regression are available, the calculation of a stepwise ridge regression requires a special purpose algorithm and computer program. The correct stepwise ridge regression procedure is given, and a parallel FORTRAN computer program is described.…
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
27 CFR 19.245 - Bonds and penal sums of bonds.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Bonds and penal sums of... Bonds and penal sums of bonds. The bonds, and the penal sums thereof, required by this subpart, are as follows: Penal Sum Type of bond Basis Minimum Maximum (a) Operations bond: (1) One plant bond—...
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P < 0.05) DM, ileal and total tract of DM, energy and nitrogen digestibilities and utilization. In Exp. 2, addition of A-V blend to the RD linearly increased (P < 0.001) ileal digestibilities and total tract utilization of DM, energy and nitrogen as well as IDE, ME, and MEn. Regressions of PBM-associated IDE, ME, or MEn intake in kcal against PBM intake were: IDE = 3,537x + 4.953, r(2) = 0.97; ME = 3,805x + 1.279, r(2) = 0.97; MEn = 3,278x + 0.164, r(2) = 0.90; and A-V blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124.
SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test
Kim, Jinseog; Kim, Dennis (Dong Hwan); Jung, Sin-Ho
2013-01-01
One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity. PMID:24174989
van Teunenbroek, A; Stijnen, T; Otten, B; de Muinck Keizer-Schrama, S; Naeraa, R W; Rongen-Westerlaken, C; Drop, S
1996-04-01
A total of 235 measurement points of 57 Dutch women with Turner's syndrome (TS), including women with spontaneous menarche and oestrogen treatment, served to develop a new Turner-specific final height (FH) prediction method (PTS). Analogous to the Tanner and Whitehouse mark 2 method (TW) for normal children, smoothed regression coefficients are tabulated for PTS for height (H), chronological age (CA) and bone age (BA), both TW RUS and Greulich and Pyle (GP). Comparison between all methods on 40 measurement points of 21 Danish TS women showed small mean prediction errors (predicted minus observed FH) and corresponding standard deviation (ESD) of both PTSRUS and PTSGP, in particular at the "younger" ages. Comparison between existing methods on the Dutch data indicated a tendency to overpredict FH. Before the CA of 9 years the mean prediction errors of the Bayley and Pinneau and TW methods were markedly higher compared with the other methods. Overall, the simplest methods--projected height (PAH) and its modification (mPAH)--were remarkably good at most ages. Although the validity of PTSRUS and PTSGP remains to be tested below the age of 6 years, both gave small mean prediction errors and a high accuracy. FH prediction in TS is important in the consideration of growth-promoting therapy or in the evaluation of its effects.
Recovering Velocity Distributions Via Penalized Likelihood
NASA Astrophysics Data System (ADS)
Merritt, David
1997-07-01
Line-of-sight velocity distributions are crucial for unravelling the dynamics of hot stellar systems. We present a new formalism based on penalized likelihood for deriving such distributions from kinematical data, and evaluate the performance of two algorithms that extract N(V) from absorption-line spectra and from sets of individual velocities. Both algorithms are superior to existing ones in that the solutions are nearly unbiased even when the data are so poor that a great deal of smoothing is required. In addition, the discrete-velocity algorithm is able to remove a known distribution of measurement errors from the estimate of N(V). The formalism is used to recover the velocity distribution of stars in five fields near the center of the globular cluster omega Centauri.
NASA Astrophysics Data System (ADS)
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes.
Lopiano, Kenneth K; Young, Linda J; Gotway, Carol A
2014-09-01
Spatially referenced datasets arising from multiple sources are routinely combined to assess relationships among various outcomes and covariates. The geographical units associated with the data, such as the geographical coordinates or areal-level administrative units, are often spatially misaligned, that is, observed at different locations or aggregated over different geographical units. As a result, the covariate is often predicted at the locations where the response is observed. The method used to align disparate datasets must be accounted for when subsequently modeling the aligned data. Here we consider the case where kriging is used to align datasets in point-to-point and point-to-areal misalignment problems when the response variable is non-normally distributed. If the relationship is modeled using generalized linear models, the additional uncertainty induced from using the kriging mean as a covariate introduces a Berkson error structure. In this article, we develop a pseudo-penalized quasi-likelihood algorithm to account for the additional uncertainty when estimating regression parameters and associated measures of uncertainty. The method is applied to a point-to-point example assessing the relationship between low-birth weights and PM2.5 levels after the onset of the largest wildfire in Florida history, the Bugaboo scrub fire. A point-to-areal misalignment problem is presented where the relationship between asthma events in Florida's counties and PM2.5 levels after the onset of the fire is assessed. Finally, the method is evaluated using a simulation study. Our results indicate the method performs well in terms of coverage for 95% confidence intervals and naive methods that ignore the additional uncertainty tend to underestimate the variability associated with parameter estimates. The underestimation is most profound in Poisson regression models.
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data.
Li, J.; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Hirakawa, Akihiro; Hamada, Chikuma; Yoshimura, Isao
2012-01-01
When identifying the differentially expressed genes (DEGs) in microarray data, we often observe heteroscedasticity between groups and dependence among genes. Incorporating these factors is necessary for sample size calculation in microarray experiments. A penalized t-statistic is widely used to improve the identifiability of DEGs. We develop a formula to calculate sample size with dependence adjustment for the penalized t-statistic. Sample size is determined on the basis of overall power under certain conditions to maintain a certain false discovery rate. The usefulness of the proposed method is demonstrated by numerical studies using both simulated data and real data.
A Guide to Assistance In Penal and Correctional Institutions
ERIC Educational Resources Information Center
Walker, Bailus; Gordon, Theodore
1973-01-01
Lists the more significant federal assistance programs relating to penal and correctional reform which may serve as a guide for environmental health specialists who are beginning to assume much broader responsibilities in institutional environmental quality. (JR)
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
ERIC Educational Resources Information Center
Hick, Thomas L.; Irvine, David J.
To eliminate maturation as a factor in the pretest-posttest design, pretest scores can be converted to anticipate posttest scores using grade equivalent scores from standardized tests. This conversion, known as historical regression, assumes that without specific intervention, growth will continue at the rate (grade equivalents per year of…
ERIC Educational Resources Information Center
Perry, Thomas
2017-01-01
Value-added (VA) measures are currently the predominant approach used to compare the effectiveness of schools. Recent educational effectiveness research, however, has developed alternative approaches including the regression discontinuity (RD) design, which also allows estimation of absolute school effects. Initial research suggests RD is a viable…
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
NASA Astrophysics Data System (ADS)
Espinoza-Ojeda, O. M.; Santoyo, E.
2016-08-01
A new practical method based on logarithmic transformation regressions was developed for the determination of static formation temperatures (SFTs) in geothermal, petroleum and permafrost bottomhole temperature (BHT) data sets. The new method involves the application of multiple linear and polynomial (from quadratic to eight-order) regression models to BHT and log-transformation (Tln) shut-in times. Selection of the best regression models was carried out by using four statistical criteria: (i) the coefficient of determination as a fitting quality parameter; (ii) the sum of the normalized squared residuals; (iii) the absolute extrapolation, as a dimensionless statistical parameter that enables the accuracy of each regression model to be evaluated through the extrapolation of the last temperature measured of the data set; and (iv) the deviation percentage between the measured and predicted BHT data. The best regression model was used for reproducing the thermal recovery process of the boreholes, and for the determination of the SFT. The original thermal recovery data (BHT and shut-in time) were used to demonstrate the new method's prediction efficiency. The prediction capability of the new method was additionally evaluated by using synthetic data sets where the true formation temperature (TFT) was known with accuracy. With these purposes, a comprehensive statistical analysis was carried out through the application of the well-known F-test and Student's t-test and the error percentage or statistical differences computed between the SFT estimates and the reported TFT data. After applying the new log-transformation regression method to a wide variety of geothermal, petroleum, and permafrost boreholes, it was found that the polynomial models were generally the best regression models that describe their thermal recovery processes. These fitting results suggested the use of this new method for the reliable estimation of SFT. Finally, the practical use of the new method was
Numerical modeling of flexible insect wings using volume penalization
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Sesterhenn, Joern
2012-11-01
We consider the effects of chordwise flexibility on the aerodynamic performance of insect flapping wings. We developed a numerical method for modeling viscous fluid flows past moving deformable foils. It extends on the previously reported model for flows past moving rigid wings (J Comput Phys 228, 2009). The two-dimensional Navier-Stokes equations are solved using a Fourier pseudo-spectral method with the no-slip boundary conditions imposed by the volume penalization method. The deformable wing section is modeled using a non-linear beam equation. We performed numerical simulations of heaving flexible plates. The results showed that the optimal stroke frequency, which maximizes the mean thrust, is lower than the resonant frequency, in agreement with the experiments by Ramananarivo et al. (PNAS 108(15), 2011). The oscillatory part of the force only increases in amplitude when the frequency increases, and at the optimal frequency it is about 3 times larger than the mean force. We also study aerodynamic interactions between two heaving flexible foils. This flow configuration corresponds to the wings of dragonflies. We explore the effects of the phase difference and spacing between the fore- and hind-wing.
NASA Astrophysics Data System (ADS)
Barzin, Razieh; Shirvani, Amin; Lotfi, Hossein
2017-01-01
Downward shortwave radiation is a key quantity in the land-atmosphere interaction. Since the moderate resolution imaging spectroradiometer data has a coarse temporal resolution, which is not suitable for estimating daily average radiation, many efforts have been undertaken to estimate instantaneous solar radiation using moderate resolution imaging spectroradiometer data. In this study, the principal components analysis technique was applied to capture the information of moderate resolution imaging spectroradiometer bands, extraterrestrial radiation, aerosol optical depth, and atmospheric water vapour. A regression model based on the principal components was used to estimate daily average shortwave radiation for ten synoptic stations in the Fars province, Iran, for the period 2009-2012. The Durbin-Watson statistic and autocorrelation function of the residuals of the fitted principal components regression model indicated that the residuals were serially independent. The results indicated that the fitted principal components regression models accounted for about 86-96% of total variance of the observed shortwave radiation values and the root mean square error was about 0.9-2.04 MJ m-2 d-1. Also, the results indicated that the model accuracy decreased as the aerosol optical depth increased and extraterrestrial radiation was the most important predictor variable among all.
Clegg, Samuel M; Barefield, James E; Wiens, Roger C; Dyar, Melinda D; Schafer, Martha W; Tucker, Jonathan M
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
LINKING LUNG AIRWAY STRUCTURE TO PULMONARY FUNCTION VIA COMPOSITE BRIDGE REGRESSION
Chen, Kun; Hoffman, Eric A.; Seetharaman, Indu; Jiao, Feiran; Lin, Ching-Long; Chan, Kung-Sik
2017-01-01
The human lung airway is a complex inverted tree-like structure. Detailed airway measurements can be extracted from MDCT-scanned lung images, such as segmental wall thickness, airway diameter, parent-child branch angles, etc. The wealth of lung airway data provides a unique opportunity for advancing our understanding of the fundamental structure-function relationships within the lung. An important problem is to construct and identify important lung airway features in normal subjects and connect these to standardized pulmonary function test results such as FEV1%. Among other things, the problem is complicated by the fact that a particular airway feature may be an important (relevant) predictor only when it pertains to segments of certain generations. Thus, the key is an efficient, consistent method for simultaneously conducting group selection (lung airway feature types) and within-group variable selection (airway generations), i.e., bi-level selection. Here we streamline a comprehensive procedure to process the lung airway data via imputation, normalization, transformation and groupwise principal component analysis, and then adopt a new composite penalized regression approach for conducting bi-level feature selection. As a prototype of composite penalization, the proposed composite bridge regression method is shown to admit an efficient algorithm, enjoy bi-level oracle properties, and outperform several existing methods. We analyze the MDCT lung image data from a cohort of 132 subjects with normal lung function. Our results show that, lung function in terms of FEV1% is promoted by having a less dense and more homogeneous lung comprising an airway whose segments enjoy more heterogeneity in wall thicknesses, larger mean diameters, lumen areas and branch angles. These data hold the potential of defining more accurately the “normal” subject population with borderline atypical lung functions that are clearly influenced by many genetic and environmental factors. PMID
Antropov, K M; Varaksin, A N
2013-01-01
This paper provides the description of Land Use Regression (LUR) modeling and the result of its application in the study of nitrogen dioxide air pollution in Ekaterinburg. The paper describes the difficulties of the modeling for air pollution caused by motor vehicles exhaust, and the ways to address these challenges. To create LUR model of the NO2 air pollution in Ekaterinburg, concentrations of NO2 were measured, data on factors affecting air pollution were collected, a statistical analysis of the data were held. A statistical model of NO2 air pollution (coefficient of determination R2 = 0.70) and a map of pollution were created.
Robust Nuclear Norm-based Matrix Regression with Applications to Robust Face Recognition.
Xie, Jianchun; Yang, Jian; Qian, Jianjun; Tai, Ying; Zhang, Hengmin
2017-02-01
Face recognition (FR) via regression analysis based classification has been widely studied in the past several years. Most existing regression analysis methods characterize the pixelwise representation error via l1-norm or l2-norm, which overlook the two-dimensional structure of the error image. Recently, the nuclear norm based matrix regression (NMR) model is proposed to characterize low-rank structure of the error image. However, the nuclear norm cannot accurately describe the lowrank structural noise when the incoherence assumptions on the singular values does not hold, since it over-penalizes several much larger singular values. To address this problem, this paper presents the robust nuclear norm to characterize the structural error image and then extends it to deal with the mixed noise. The majorization-minimization (MM) method is applied to derive a iterative scheme for minimization of the robust nuclear norm optimization problem. Then, an efficiently alternating direction method of multipliers (ADMM) method is used to solve the proposed models. We use weighted nuclear norm as classification criterion to obtain the final recognition results. Experiments on several public face databases demonstrate the effectiveness of our models in handling with variations of structural noise (occlusion, illumination, etc.) and mixed noise.
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
pPXF: Penalized Pixel-Fitting stellar kinematics extraction
NASA Astrophysics Data System (ADS)
Cappellari, Michele
2012-10-01
pPXF is an IDL (and free GDL or FL) program which extracts the stellar kinematics or stellar population from absorption-line spectra of galaxies using the Penalized Pixel-Fitting method (pPXF) developed by Cappellari & Emsellem (2004, PASP, 116, 138). Additional features implemented in the pPXF routine include: Optimal template: Fitted together with the kinematics to minimize template-mismatch errors. Also useful to extract gas kinematics or derive emission-corrected line-strengths indexes. One can use synthetic templates to study the stellar population of galaxies via "Full Spectral Fitting" instead of using traditional line-strengths.Regularization of templates weights: To reduce the noise in the recovery of the stellar population parameters and attach a physical meaning to the output weights assigned to the templates in term of the star formation history (SFH) or metallicity distribution of an individual galaxy.Iterative sigma clipping: To clean the spectra from residual bad pixels or cosmic rays.Additive/multiplicative polynomials: To correct low frequency continuum variations. Also useful for calibration purposes.
Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models.
Michelot, Théo; Langrock, Roland; Kneib, Thomas; King, Ruth
2016-01-01
We discuss the semiparametric modeling of mark-recapture-recovery data where the temporal and/or individual variation of model parameters is explained via covariates. Typically, in such analyses a fixed (or mixed) effects parametric model is specified for the relationship between the model parameters and the covariates of interest. In this paper, we discuss the modeling of the relationship via the use of penalized splines, to allow for considerably more flexible functional forms. Corresponding models can be fitted via numerical maximum penalized likelihood estimation, employing cross-validation to choose the smoothing parameters in a data-driven way. Our contribution builds on and extends the existing literature, providing a unified inferential framework for semiparametric mark-recapture-recovery models for open populations, where the interest typically lies in the estimation of survival probabilities. The approach is applied to two real datasets, corresponding to gray herons (Ardea cinerea), where we model the survival probability as a function of environmental condition (a time-varying global covariate), and Soay sheep (Ovis aries), where we model the survival probability as a function of individual weight (a time-varying individual-specific covariate). The proposed semiparametric approach is compared to a standard parametric (logistic) regression and new interesting underlying dynamics are observed in both cases.
Zhang, Yan-Feng; Zhang, Li; Gao, Zhi-Xian; Dai, Shu-Gui
2012-01-01
Polycyclic aromatic hydrocarbons (PAHs) are ubiquitous contaminants found in the environment. Immunoassays represent useful analytical methods to complement traditional analytical procedures for PAHs. Cross-reactivity (CR) is a very useful character to evaluate the extent of cross-reaction of a cross-reactant in immunoreactions and immunoassays. The quantitative relationships between the molecular properties and the CR of PAHs were established by stepwise multiple linear regression, principal component regression and partial least square regression, using the data of two commercial enzyme-linked immunosorbent assay (ELISA) kits. The objective is to find the most important molecular properties that affect the CR, and predict the CR by multiple regression methods. The results show that the physicochemical, electronic and topological properties of the PAH molecules have an integrated effect on the CR properties for the two ELISAs, among which molar solubility (S(m)) and valence molecular connectivity index ((3)χ(v)) are the most important factors. The obtained regression equations for Ris(C) kit are all statistically significant (p < 0.005) and show satisfactory ability for predicting CR values, while equations for RaPID kit are all not significant (p > 0.05) and not suitable for predicting. It is probably because that the Ris(C) immunoassay employs a monoclonal antibody, while the RaPID kit is based on polyclonal antibody. Considering the important effect of solubility on the CR values, cross-reaction potential (CRP) is calculated and used as a complement of CR for evaluation of cross-reactions in immunoassays. Only the compounds with both high CR and high CRP can cause intense cross-reactions in immunoassays.
Raevsky, O A; Polianczyk, D E; Mukhametov, A; Grigorev, V Y
2016-08-01
Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim
2015-01-01
Background: Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. Materials and Methods: A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Results: Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Conclusion: Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts. PMID:26500410
NASA Astrophysics Data System (ADS)
Huang, Cong; Liu, Dan-Dan; Wang, Jing-Song
2009-06-01
The 10.7 cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7 cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network.
Boy-Roura, M; Cameron, K C; Di, H J
2016-02-01
This study presents a meta-analysis of 12 experiments that quantify nitrate-N leaching losses from grazed pasture systems in alluvial sedimentary soils in Canterbury (New Zealand). Mean measured nitrate-N leached (kg N/ha × 100 mm drainage) losses were 2.7 when no urine was applied, 8.4 at the urine rate of 300 kg N/ha, 9.8 at 500 kg N/ha, 24.5 at 700 kg N/ha and 51.4 at 1000 kg N/ha. Lismore soils presented significantly higher nitrate-N losses compared to Templeton soils. Moreover, a multiple linear regression (MLR) model was developed to determine the key factors that influence nitrate-N leaching and to predict nitrate-N leaching losses. The MLR analyses was calibrated and validated using 82 average values of nitrate-N leached and 48 explanatory variables representative of nitrogen inputs and outputs, transport, attenuation of nitrogen and farm management practices. The MLR model (R (2) = 0.81) showed that nitrate-N leaching losses were greater at higher urine application rates and when there was more drainage from rainfall and irrigation. On the other hand, nitrate leaching decreased when nitrification inhibitors (e.g. dicyandiamide (DCD)) were applied. Predicted nitrate-N leaching losses at the paddock scale were calculated using the MLR equation, and they varied largely depending on the urine application rate and urine patch coverage.
NASA Astrophysics Data System (ADS)
Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika
2015-12-01
Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Lange, Kenneth; Papp, Jeanette C.; Sinsheimer, Janet S.; Sobel, Eric M.
2014-01-01
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future. PMID:24955378
Crime and Punishment: Are Copyright Violators Ever Penalized?
ERIC Educational Resources Information Center
Russell, Carrie
2004-01-01
Is there a Web site that keeps track of copyright Infringers and fines? Some colleagues don't believe that copyright violators are ever penalized. This question was asked by a reader in a question and answer column of "School Library Journal". Carrie Russell is the American Library Association's copyright specialist, and she will answer selected…
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2012 CFR
2012-04-01
... OF THE TREASURY LIQUORS BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar year during the period of the bond on beer: (i) Removed for transfer to the brewery from other breweries...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2010 CFR
2010-04-01
... OF THE TREASURY LIQUORS BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar year during the period of the bond on beer: (i) Removed for transfer to the brewery from other breweries...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2011 CFR
2011-04-01
... OF THE TREASURY LIQUORS BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar year during the period of the bond on beer: (i) Removed for transfer to the brewery from other breweries...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2014 CFR
2014-04-01
... tax at the rates prescribed by law, on the maximum quantity of beer used in the production of... OF THE TREASURY ALCOHOL BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar...
27 CFR 25.93 - Penal sum of bond.
Code of Federal Regulations, 2013 CFR
2013-04-01
... tax at the rates prescribed by law, on the maximum quantity of beer used in the production of... OF THE TREASURY ALCOHOL BEER Bonds and Consents of Surety § 25.93 Penal sum of bond. (a)(1) Brewers... calculated at the rates prescribed by law which the brewer will become liable to pay during a calendar...
Education--Penal Institutions: U. S. and Europe.
ERIC Educational Resources Information Center
Kerle, Ken
Penal systems of European countries vary in educational programs and humanizing efforts. A high percentage of Soviet prisoners, many incarcerated for ideological/religious beliefs, are confined to labor colonies. All inmates are obligated to learn a trade, one of the qualifications for release being evidence of some trade skill. Swedish…
Indian NGO challenges penal code prohibition of "unnatural offences".
Csete, Joanne
2002-07-01
On 7 December 2001, the Naz Foundation (India) Trust (NFIT), a non-governmental organization based in New Delhi, filed a petition in the Delhi High Court to repeal the "unnatural offences" section of the Indian Penal Code that criminalizes men who have sex with men.
[Arterial hypertension in females engaged into penal system work].
Tagirova, M M; El'garov, A A; Shogenova, A B; Murtazov, A M
2010-01-01
The authors proved significant prevalence of arterial hypertension and atherosclerosis risk factors in women engaged into penal system work--so these values form cardiovascular risk caused by environmental parameters. Teveten and Nebilet were proved effective in the examinees with arterial hypertension.
Briggs, D J; de Hoogh, C; Gulliver, J; Wills, J; Elliott, P; Kingham, S; Smallbone, K
2000-05-15
Accurate, high-resolution maps of traffic-related air pollution are needed both as a basis for assessing exposures as part of epidemiological studies, and to inform urban air-quality policy and traffic management. This paper assesses the use of a GIS-based, regression mapping technique to model spatial patterns of traffic-related air pollution. The model--developed using data from 80 passive sampler sites in Huddersfield, as part of the SAVIAH (Small Area Variations in Air Quality and Health) project--uses data on traffic flows and land cover in the 300-m buffer zone around each site, and altitude of the site, as predictors of NO2 concentrations. It was tested here by application in four urban areas in the UK: Huddersfield (for the year following that used for initial model development), Sheffield, Northampton, and part of London. In each case, a GIS was built in ArcInfo, integrating relevant data on road traffic, urban land use and topography. Monitoring of NO2 was undertaken using replicate passive samplers (in London, data were obtained from surveys carried out as part of the London network). In Huddersfield, Sheffield and Northampton, the model was first calibrated by comparing modelled results with monitored NO2 concentrations at 10 randomly selected sites; the calibrated model was then validated against data from a further 10-28 sites. In London, where data for only 11 sites were available, validation was not undertaken. Results showed that the model performed well in all cases. After local calibration, the model gave estimates of mean annual NO2 concentrations within a factor of 1.5 of the actual mean (approx. 70-90%) of the time and within a factor of 2 between 70 and 100% of the time. r2 values between modelled and observed concentrations are in the range of 0.58-0.76. These results are comparable to those achieved by more sophisticated dispersion models. The model also has several advantages over dispersion modelling. It is able, for example, to provide
Chen, Qingxia; Ibrahim, Joseph G
2014-07-01
Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A
2015-03-15
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is.
Technology Transfer Automated Retrieval System (TEKTRAN)
In fiber length measurement by the rapid method of testing fiber beards instead of testing individual fibers, only the fiber portion projected from the fiber clamp can be measured. The length distribution of the projecting portion is very different from that of the original sample. The Part 1 pape...
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
NASA Astrophysics Data System (ADS)
Ren, Xue; Lee, Soo-Jin
2016-03-01
Patch-based regularization methods, which have proven useful not only for image denoising, but also for tomographic reconstruction, penalize image roughness based on the intensity differences between two nearby patches. However, when two patches are not considered to be similar in the general sense of similarity but still have similar features in a scaled domain after normalizing the two patches, the difference between the two patches in the scaled domain is smaller than the intensity difference measured in the standard method. Standard patch-based methods tend to ignore such similarities due to the large intensity differences between the two patches. In this work, for patch-based penalized likelihood tomographic reconstruction, we propose a new approach to the similarity measure using the normalized patch differences as well as the intensity-based patch differences. A normalized patch difference is obtained by normalizing and scaling the intensity-based patch difference. To selectively take advantage of the standard patch (SP) and normalized patch (NP), we use switching schemes that can select either SP or NP based on the gradient of a reconstructed image. In this case the SP is selected for restoring large-scaled piecewise-smooth regions, while the NP is selected for preserving the contrast of fine details. The numerical experiments using software phantom demonstrate that our proposed methods not only improve overall reconstruction accuracy in terms of the percentage error, but also reveal better recovery of fine details in terms of the contrast recovery coefficient.
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
ERIC Educational Resources Information Center
Walton, Joseph M.; And Others
1978-01-01
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
Eriksson, Lennart; Jaworska, Joanna; Worth, Andrew P; Cronin, Mark T D; McDowell, Robert M; Gramatica, Paola
2003-08-01
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation.
Eriksson, Lennart; Jaworska, Joanna; Worth, Andrew P; Cronin, Mark T D; McDowell, Robert M; Gramatica, Paola
2003-01-01
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation. PMID:12896860
Hegazy, Maha A; Lotfy, Hayam M; Rezk, Mamdouh R; Omran, Yasmin Rostom
2015-04-05
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets.
Heinze, Georg; Puhr, Rainer
2010-03-30
Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at: http://www.muw.ac.at/msi/biometrie/programs.
Zhu, Ying; Tan, Tuck Lee
2016-04-15
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
NASA Astrophysics Data System (ADS)
Zhu, Ying; Tan, Tuck Lee
2016-04-01
An effective and simple analytical method using Fourier transform infrared (FTIR) spectroscopy to distinguish wild-grown high-quality Ganoderma lucidum (G. lucidum) from cultivated one is of essential importance for its quality assurance and medicinal value estimation. Commonly used chemical and analytical methods using full spectrum are not so effective for the detection and interpretation due to the complex system of the herbal medicine. In this study, two penalized discriminant analysis models, penalized linear discriminant analysis (PLDA) and elastic net (Elnet),using FTIR spectroscopy have been explored for the purpose of discrimination and interpretation. The classification performances of the two penalized models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The Elnet model involving a combination of L1 and L2 norm penalties enabled an automatic selection of a small number of informative spectral absorption bands and gave an excellent classification accuracy of 99% for discrimination between spectra of wild-grown and cultivated G. lucidum. Its classification performance was superior to that of the PLDA model in a pure L1 setting and outperformed the PCDA and PLSDA models using full wavelength. The well-performed selection of informative spectral features leads to substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of G. lucidum regarding its anti-cancer effects.
Variable Selection in Semiparametric Regression Modeling.
Li, Runze; Liang, Hua
2008-01-01
In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and select significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each submodel. This leads to very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. The newly proposed procedures are distinguished from the traditional ones in that they delete insignificant variables and estimate the coefficients of significant variables simultaneously. This allows us to establish the sampling properties of the resulting estimate. We first establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we then establish the asymptotic normality of the resulting estimate, and further demonstrate that the proposed procedures perform as well as an oracle procedure. Semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.
De la Cruz, Rolando; Fuentes, Claudio; Meza, Cristian; Lee, Dae-Jin; Arribas-Gil, Ana
2017-02-19
We propose a semiparametric nonlinear mixed-effects model (SNMM) using penalized splines to classify longitudinal data and improve the prediction of a binary outcome. The work is motivated by a study in which different hormone levels were measured during the early stages of pregnancy, and the challenge is using this information to predict normal versus abnormal pregnancy outcomes. The aim of this paper is to compare models and estimation strategies on the basis of alternative formulations of SNMMs depending on the characteristics of the data set under consideration. For our motivating example, we address the classification problem using a particular case of the SNMM in which the parameter space has a finite dimensional component (fixed effects and variance components) and an infinite dimensional component (unknown function) that need to be estimated. The nonparametric component of the model is estimated using penalized splines. For the parametric component, we compare the advantages of using random effects versus direct modeling of the correlation structure of the errors. Numerical studies show that our approach improves over other existing methods for the analysis of this type of data. Furthermore, the results obtained using our method support the idea that explicit modeling of the serial correlation of the error term improves the prediction accuracy with respect to a model with random effects, but independent errors. Copyright © 2017 John Wiley & Sons, Ltd.
Penal managerialism from within: implications for theory and research.
Cheliotis, Leonidas K
2006-01-01
Unlike the bulk of penological scholarship dealing with managerialist reforms, this article calls for greater theoretical and research attention to the often pernicious impact of managerialism on criminal justice professionals. Much in an ideal-typical fashion, light is shed on: the reasons why contemporary penal bureaucracies endeavor systematically to strip criminal justice work of its inherently affective nature; the structural forces that ensure control over officials; the processes by which those forces come into effect; and the human consequences of submission to totalitarian bureaucratic milieus. It is suggested that the heavy preoccupation of present-day penality with the predictability and calculability of outcomes entails the atomization of professionals and the dehumanization of their work. This is achieved through a kaleidoscope of direct and indirect mechanisms that naturalize and/or legitimate acquiescence.
Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel
2016-10-01
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.
[Guideline 'Medicinal care for drug addicts in penal institutions'].
Westra, Michel; de Haan, Hein A; Arends, Marleen T; van Everdingen, Jannes J E; Klazinga, Niek S
2009-01-01
In the Netherlands, the policy on care for prisoners who are addicted to opiates is still heterogeneous. The recent guidelines entitled 'Medicinal care for drug addicts in penal institutions' should contribute towards unambiguous and more evidence-based treatment for this group. In addition, it should improve and bring the care pathways within judicial institutions and mainstream healthcare more into line with one another. Each rational course of medicinal treatment will initially be continued in the penal institution. In penal institutions the help on offer is mainly focused on abstinence from illegal drugs while at the same time limiting the damage caused to the health of the individual user. Methadone is regarded at the first choice for maintenance therapy. For patient safety, this is best given in liquid form in sealed cups of 5 mg/ml once daily in the morning. Recently a combination preparation containing buprenorphine and naloxone - a complete opiate antagonist - has become available. On discontinuation of opiate maintenance treatment intensive follow-up care is necessary. During this period there is considerable risk of a potentially lethal overdose. Detoxification should be coupled with psychosocial or medicinal intervention aimed at preventing relapse. Naltrexone is currently the only available opiate antagonist for preventing relapse. In those addicted to opiates, who also take benzodiazepines without any indication, it is strongly recommended that these be reduced and discontinued. This can be achieved by converting the regular dosage into the equivalent in diazepam and then reducing this dosage by a maximum of 25% a week.
Nonlinear Regression Methods for Estimation
2005-09-01
accuracy when the geometric dilution of precision ( GDOP ) causes collinearity, which in turn brings about poor position estimates. The main goal is...measurements are needed to wash-out the 168 measurement noise. Furthermore, the measurement arrangement’s geometry ( GDOP ) strongly impacts the achievable...Newton algorithm, 61 geometric dilution of precision, see GDOP initial parameter estimate, 91 iterative least squares, see ILS Kalman filtering, 10
Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization.
Wang, Guobao; Qi, Jinyi
2012-12-01
Iterative image reconstruction for positron emission tomography (PET) can improve image quality by using spatial regularization that penalizes image intensity difference between neighboring pixels. The most commonly used quadratic penalty often oversmoothes edges and fine features in reconstructed images. Nonquadratic penalties can preserve edges but often introduce piece-wise constant blocky artifacts and the results are also sensitive to the hyper-parameter that controls the shape of the penalty function. This paper presents a patch-based regularization for iterative image reconstruction that uses neighborhood patches instead of individual pixels in computing the nonquadratic penalty. The new regularization is more robust than the conventional pixel-based regularization in differentiating sharp edges from random fluctuations due to noise. An optimization transfer algorithm is developed for the penalized maximum likelihood estimation. Each iteration of the algorithm can be implemented in three simple steps: an EM-like image update, an image smoothing and a pixel-by-pixel image fusion. Computer simulations show that the proposed patch-based regularization can achieve higher contrast recovery for small objects without increasing background variation compared with the quadratic regularization. The reconstruction is also more robust to the hyper-parameter than conventional pixel-based nonquadratic regularizations. The proposed regularization method has been applied to real 3-D PET data.
Kramer, S.
1996-12-31
In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.
Huybrechts, Inge; Lioret, Sandrine; Mouratidou, Theodora; Gunter, Marc J; Manios, Yannis; Kersting, Mathilde; Gottrand, Frederic; Kafatos, Anthony; De Henauw, Stefaan; Cuenca-García, Magdalena; Widhalm, Kurt; Gonzales-Gross, Marcela; Molnar, Denes; Moreno, Luis A; McNaughton, Sarah A
2017-01-01
This study aims to examine repeatability of reduced rank regression (RRR) methods in calculating dietary patterns (DP) and cross-sectional associations with overweight (OW)/obesity across European and Australian samples of adolescents. Data from two cross-sectional surveys in Europe (2006/2007 Healthy Lifestyle in Europe by Nutrition in Adolescence study, including 1954 adolescents, 12-17 years) and Australia (2007 National Children's Nutrition and Physical Activity Survey, including 1498 adolescents, 12-16 years) were used. Dietary intake was measured using two non-consecutive, 24-h recalls. RRR was used to identify DP using dietary energy density, fibre density and percentage of energy intake from fat as the intermediate variables. Associations between DP scores and body mass/fat were examined using multivariable linear and logistic regression as appropriate, stratified by sex. The first DP extracted (labelled 'energy dense, high fat, low fibre') explained 47 and 31 % of the response variation in Australian and European adolescents, respectively. It was similar for European and Australian adolescents and characterised by higher consumption of biscuits/cakes, chocolate/confectionery, crisps/savoury snacks, sugar-sweetened beverages, and lower consumption of yogurt, high-fibre bread, vegetables and fresh fruit. DP scores were inversely associated with BMI z-scores in Australian adolescent boys and borderline inverse in European adolescent boys (so as with %BF). Similarly, a lower likelihood for OW in boys was observed with higher DP scores in both surveys. No such relationships were observed in adolescent girls. In conclusion, the DP identified in this cross-country study was comparable for European and Australian adolescents, demonstrating robustness of the RRR method in calculating DP among populations. However, longitudinal designs are more relevant when studying diet-obesity associations, to prevent reverse causality.
Hwang, Jae Joon; Kim, Kee-Deog; Park, Hyok; Park, Chang Seo; Jeong, Ho-Gul
2014-01-01
Superimposition has been used as a method to evaluate the changes of orthodontic or orthopedic treatment in the dental field. With the introduction of cone beam CT (CBCT), evaluating 3 dimensional changes after treatment became possible by superimposition. 4 point plane orientation is one of the simplest ways to achieve superimposition of 3 dimensional images. To find factors influencing superimposition error of cephalometric landmarks by 4 point plane orientation method and to evaluate the reproducibility of cephalometric landmarks for analyzing superimposition error, 20 patients were analyzed who had normal skeletal and occlusal relationship and took CBCT for diagnosis of temporomandibular disorder. The nasion, sella turcica, basion and midpoint between the left and the right most posterior point of the lesser wing of sphenoidal bone were used to define a three-dimensional (3D) anatomical reference co-ordinate system. Another 15 reference cephalometric points were also determined three times in the same image. Reorientation error of each landmark could be explained substantially (23%) by linear regression model, which consists of 3 factors describing position of each landmark towards reference axes and locating error. 4 point plane orientation system may produce an amount of reorientation error that may vary according to the perpendicular distance between the landmark and the x-axis; the reorientation error also increases as the locating error and shift of reference axes viewed from each landmark increases. Therefore, in order to reduce the reorientation error, accuracy of all landmarks including the reference points is important. Construction of the regression model using reference points of greater precision is required for the clinical application of this model.
Seyedmahmoud, Rasoul; Mozetic, Pamela; Rainer, Alberto; Giannitelli, Sara Maria; Basoli, Francesco; Trombetta, Marcella; Traversa, Enrico; Licoccia, Silvia; Rinaldi, Antonio
2015-01-01
This two-articles series presents an in-depth discussion of electrospun poly-L-lactide scaffolds for tissue engineering by means of statistical methodologies that can be used, in general, to gain a quantitative and systematic insight about effects and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y₁₋₄), this Part-2 article continues and refocuses the discussion on the interdependence of scaffold properties investigated by standard regression methods. The discussion first explores the connection between mechanical properties (Y₄) and morphological descriptors of the scaffolds (Y₁₋₃) in 32 types of scaffolds, finding that the mean fiber diameter (Y₁) plays a predominant role which is nonetheless and crucially modulated by the molecular weight (MW) of PLLA. The second part examines the biological performance (Y₅) (i.e. the cell proliferation of seeded bone marrow-derived mesenchymal stromal cells) on a random subset of eight scaffolds vs. the mechanomorphological properties (Y₁₋₄). In this case, the featured regression analysis on such an incomplete set was not conclusive, though, indirectly suggesting in quantitative terms that cell proliferation could not fully be explained as a function of considered mechanomorphological properties (Y₁₋₄), but in the early stage seeding, and that a randomization effects occurs over time such that the differences in initial cell proliferation performance (at day 1) is smeared over time. The findings may be the cornerstone of a novel route to accrue sufficient understanding and establish design rules for scaffold biofunctional vs. architecture, mechanical properties, and process parameters.
Hwang, Jae Joon; Kim, Kee-Deog; Park, Hyok; Park, Chang Seo; Jeong, Ho-Gul
2014-01-01
Superimposition has been used as a method to evaluate the changes of orthodontic or orthopedic treatment in the dental field. With the introduction of cone beam CT (CBCT), evaluating 3 dimensional changes after treatment became possible by superimposition. 4 point plane orientation is one of the simplest ways to achieve superimposition of 3 dimensional images. To find factors influencing superimposition error of cephalometric landmarks by 4 point plane orientation method and to evaluate the reproducibility of cephalometric landmarks for analyzing superimposition error, 20 patients were analyzed who had normal skeletal and occlusal relationship and took CBCT for diagnosis of temporomandibular disorder. The nasion, sella turcica, basion and midpoint between the left and the right most posterior point of the lesser wing of sphenoidal bone were used to define a three-dimensional (3D) anatomical reference co-ordinate system. Another 15 reference cephalometric points were also determined three times in the same image. Reorientation error of each landmark could be explained substantially (23%) by linear regression model, which consists of 3 factors describing position of each landmark towards reference axes and locating error. 4 point plane orientation system may produce an amount of reorientation error that may vary according to the perpendicular distance between the landmark and the x-axis; the reorientation error also increases as the locating error and shift of reference axes viewed from each landmark increases. Therefore, in order to reduce the reorientation error, accuracy of all landmarks including the reference points is important. Construction of the regression model using reference points of greater precision is required for the clinical application of this model. PMID:25372707
NASA Astrophysics Data System (ADS)
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-09-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration, and
Dang, H.; Wang, A. S.; Sussman, Marc S.; Siewerdsen, J. H.; Stayman, J. W.
2014-01-01
Sequential imaging studies are conducted in many clinical scenarios. Prior images from previous studies contain a great deal of patient-specific anatomical information and can be used in conjunction with subsequent imaging acquisitions to maintain image quality while enabling radiation dose reduction (e.g., through sparse angular sampling, reduction in fluence, etc.). However, patient motion between images in such sequences results in misregistration between the prior image and current anatomy. Existing prior-image-based approaches often include only a simple rigid registration step that can be insufficient for capturing complex anatomical motion, introducing detrimental effects in subsequent image reconstruction. In this work, we propose a joint framework that estimates the 3D deformation between an unregistered prior image and the current anatomy (based on a subsequent data acquisition) and reconstructs the current anatomical image using a model-based reconstruction approach that includes regularization based on the deformed prior image. This framework is referred to as deformable prior image registration, penalized-likelihood estimation (dPIRPLE). Central to this framework is the inclusion of a 3D B-spline-based free-form-deformation model into the joint registration-reconstruction objective function. The proposed framework is solved using a maximization strategy whereby alternating updates to the registration parameters and image estimates are applied allowing for improvements in both the registration and reconstruction throughout the optimization process. Cadaver experiments were conducted on a cone-beam CT testbench emulating a lung nodule surveillance scenario. Superior reconstruction accuracy and image quality were demonstrated using the dPIRPLE algorithm as compared to more traditional reconstruction methods including filtered backprojection, penalized-likelihood estimation (PLE), prior image penalized-likelihood estimation (PIPLE) without registration
McDonough, Molly; Rabe, Brian; Saha, Margaret
2015-01-01
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. PMID:25738861
[Qualification of persons taking part in psychiatric opinion-giving in a penal trial].
Zgryzek, K
1998-01-01
Introduction of new Penal code by the Parliament brings about the necessity of conducting a detailed analysis of particular legal solutions in the code. The authors present an analysis of selected issues included in the Penal Code, referring to proof from the opinion of psychiatric experts, particularly those regarding professional qualifications of persons appointed by the court in a penal trial to assess mental health state of definite persons (a witness, a victim, the perpetrator). It was accepted that the only persons authorized the conduct psychiatric examination in a penal trial are those with at least first degree specialization in psychiatry.
Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction
2016-01-01
Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n ≪ m) data. They also provide a natural solution to all Lp p ∈ [0,2] problems, including lasso with p = 1 and elastic net with p ∈ [1,2]. The regularized parameter λ can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data. PMID:27843486
Kong, Changsu; Adeola, Olayiwola
2016-01-01
The present study was conducted to determine ileal digestible energy (IDE), metabolizable energy (ME), and nitrogen-corrected ME (MEn) contents of expeller- (EECM) and solvent-extracted canola meal (SECM) for broiler chickens using the regression method. Dietary treatments consisted of a corn-soybean meal reference diet and four assay diets prepared by supplementing the reference diet with each of canola meals (EECM or SECM) at 100 or 200 g/kg, respectively, to partly replace the energy yielding sources in the reference diet. Birds received a standard starter diet from day 0 to 14 and the assay diets from day 14 to 21. On day 14, a total of 240 birds were grouped into eight blocks by body weight and randomly allocated to five dietary treatments in each block with six birds per cage in a randomized complete block design. Excreta samples were collected from day 18 to 20 and ileal digesta were collected on day 21. The IDE, ME, and MEn (kcal/kg DM) of EECM or SECM were derived from the regression of EECM- or SECM-associated IDE, ME and MEn intake (Y, kcal) against the intake of EECM or SECM (X, kg DM), respectively. Regression equations of IDE, ME and MEn for the EECM-substituted diet were Y = -21.2 + 3035X (r(2) = 0.946), Y = -1.0 + 2807X (r(2) = 0.884) and Y = -2.0 + 2679X (r(2) = 0.902), respectively. The respective equations for the SECM diet were Y = 20.7 + 2881X (r(2) = 0.962), Y = 27.2 + 2077X (r(2) = 0.875) and Y = 24.7 + 2013X (r(2) = 0.901). The slope for IDE did not differ between the EECM and SECM whereas the slopes for ME and MEn were greater (P < 0.05) for the EECM than for the SECM. These results indicate that the EECM might be a superior energy source for broiler chickens compared with the SECM when both canola meals are used to reduce the cost of feeding.
Yuan, Haibo; Liu, Xiaowei; Xiang, Maosheng; Huang, Yang; Zhang, Huihua; Chen, Bingqiu E-mail: x.liu@pku.edu.cn
2015-02-01
In this paper we propose a spectroscopy-based stellar color regression (SCR) method to perform accurate color calibration for modern imaging surveys, taking advantage of millions of stellar spectra now available. The method is straightforward, insensitive to systematic errors in the spectroscopically determined stellar atmospheric parameters, applicable to regions that are effectively covered by spectroscopic surveys, and capable of delivering an accuracy of a few millimagnitudes for color calibration. As an illustration, we have applied the method to the Sloan Digital Sky Survey (SDSS) Stripe 82 data. With a total number of 23,759 spectroscopically targeted stars, we have mapped out the small but strongly correlated color zero-point errors present in the photometric catalog of Stripe 82, and we improve the color calibration by a factor of two to three. Our study also reveals some small but significant magnitude dependence errors in the z band for some charge-coupled devices (CCDs). Such errors are likely to be present in all the SDSS photometric data. Our results are compared with those from a completely independent test based on the intrinsic colors of red galaxies presented by Ivezić et al. The comparison, as well as other tests, shows that the SCR method has achieved a color calibration internally consistent at a level of about 5 mmag in u – g, 3 mmag in g – r, and 2 mmag in r – i and i – z. Given the power of the SCR method, we discuss briefly the potential benefits by applying the method to existing, ongoing, and upcoming imaging surveys.
Use of Multiple Correlation Analysis and Multiple Regression Analysis.
ERIC Educational Resources Information Center
Huberty, Carl J.; Petoskey, Martha D.
1999-01-01
Distinguishes between multiple correlation and multiple regression analysis. Illustrates suggested information reporting methods and reviews the use of regression methods when dealing with problems of missing data. (SK)
Regressive systemic sclerosis.
Black, C; Dieppe, P; Huskisson, T; Hart, F D
1986-01-01
Systemic sclerosis is a disease which usually progresses or reaches a plateau with persistence of symptoms and signs. Regression is extremely unusual. Four cases of established scleroderma are described in which regression is well documented. The significance of this observation and possible mechanisms of disease regression are discussed. Images PMID:3718012
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Sampling and handling artifacts can bias filter-based measurements of particulate organic carbon (OC). Several measurement-based methods for OC artifact reduction and/or estimation are currently used in research-grade field studies. OC frequently is not artifact-corrected in larg...
Rank regression: an alternative regression approach for data with outliers.
Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin
2014-10-01
Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.
The geology of the Penal/Barrackpore field, onshore Trinidad
Dyer, B.L. )
1991-03-01
The Penal/Barrackpore field was discovered in 1938 and is located in the southern subbasin of onshore Trinidad. It is one of a series of northeast-southwest trending en echelon middle Miocene anticlinal structures that was later accentuated by late Pliocene transpressional folding. The middle Miocene Herrera and Karamat turbiditic sandstones are the primary reservoir rock in the subsurface anticline of the Penal/Barrackpore field. These turbidites were sourced from the north and deposited within the marls and clays of the Cipero Formation. The Karamat sandstones are followed in vertical stratigraphic succession by the shales and boulder beds of the Lengua formation, the turbidites and deltaics of the lower and middle Cruse, and the deltaics of the upper Cruse, the Forest, and the Morne L'Enfer formations. Relative movement of the South American and Caribbean plates climaxed in the middle Miocene compressive tectonic event and produced an imbricate pattern of southward-facing basement-involved thrusts. The Pliocene deltaics were sourced by erosion of Miocene highs to the north and the South American landmass to the south. These deltaics exhibit onlap onto the preexisting Miocene highs. The late Pliocene transpression also coincides with the onset of oil migration along faults, diapirs, and unconformities from the Cretaceous Naparima Hill source. The Lengua Formation and the upper Forest clays are considered effect seals. Hydrocarbon trapping is structurally and stratigraphically controlled, with structure being the dominant trapping mechanism. Ultimate recoverable reserves for the Penal/Barrackpore field are estimated at 127.9 MMBO and 628.8 bcf. The field is presently owned and operated by the Trinidad and Tobago Oil Company Limited (TRINTOC).
L1 penalization of volumetric dose objectives in optimal control of PDEs
Barnard, Richard C.; Clason, Christian
2017-02-11
This work is concerned with a class of PDE-constrained optimization problems that are motivated by an application in radiotherapy treatment planning. Here the primary design objective is to minimize the volume where a functional of the state violates a prescribed level, but prescribing these levels in the form of pointwise state constraints leads to infeasible problems. We therefore propose an alternative approach based on L1 penalization of the violation that is also applicable when state constraints are infeasible. We establish well-posedness of the corresponding optimal control problem, derive first-order optimality conditions, discuss convergence of minimizers as the penalty parameter tendsmore » to infinity, and present a semismooth Newton method for their efficient numerical solution. Finally, the performance of this method for a model problem is illustrated and contrasted with an alternative approach based on (regularized) state constraints.« less
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis. PMID:27672418
43 CFR 4170.2-1 - Penal provisions under the Taylor Grazing Act.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Penal provisions under the Taylor Grazing...) BUREAU OF LAND MANAGEMENT, DEPARTMENT OF THE INTERIOR RANGE MANAGEMENT (4000) GRAZING ADMINISTRATION-EXCLUSIVE OF ALASKA Penalties § 4170.2-1 Penal provisions under the Taylor Grazing Act. Under section 2...
27 CFR 19.957 - Instructions to compute bond penal sum.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Instructions to compute bond penal sum. 19.957 Section 19.957 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX... Fuel Use Bonds § 19.957 Instructions to compute bond penal sum. (a) Medium plants. To find the...
An, Yongkai; Lu, Wenxi; Cheng, Weiguo
2015-07-30
This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately.
NASA Astrophysics Data System (ADS)
Ochoa Gutierrez, L. H.; Vargas Jimenez, C. A.; Niño Vasquez, L. F.
2011-12-01
The "Sabana de Bogota" (Bogota Savannah) is the most important social and economical center of Colombia. Almost the third of population is concentrated in this region and generates about the 40% of Colombia's Internal Brute Product (IBP). According to this, the zone presents an elevated vulnerability in case that a high destructive seismic event occurs. Historical evidences show that high magnitude events took place in the past with a huge damage caused to the city and indicate that is probable that such events can occur in the next years. This is the reason why we are working in an early warning generation system, using the first few seconds of a seismic signal registered by three components and wide band seismometers. Such system can be implemented using Computational Intelligence tools, designed and calibrated to the particular Geological, Structural and environmental conditions present in the region. The methods developed are expected to work on real time, thus suitable software and electronic tools need to be developed. We used Support Vector Machines Regression (SVMR) methods trained and tested with historic seismic events registered by "EL ROSAL" Station, located near Bogotá, calculating descriptors or attributes as the input of the model, from the first 6 seconds of signal. With this algorithm, we obtained less than 10% of mean absolute error and correlation coefficients greater than 85% in hypocentral distance and Magnitude estimation. With this results we consider that we can improve the method trying to have better accuracy with less signal time and that this can be a very useful model to be implemented directly in the seismological stations to generate a fast characterization of the event, broadcasting not only raw signal but pre-processed information that can be very useful for accurate Early Warning Generation.
Regression Analysis by Example. 5th Edition
ERIC Educational Resources Information Center
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Fujita, A; Takabatake, H; Tagaki, S; Sohda, T; Sekine, K
1996-03-01
To evaluate the effect of chemotherapy on QOL, the survival period was categorized by 3 intervals: one in the hospital for chemotherapy (TOX), on an outpatient basis (TWiST Time without Symptom and Toxicity), and in the hospital for conservative therapy (REL). Coefficients showing the QOL level were expressed as ut, uw and ur. If uw was 1 and ut and ur were plotted at less than 1, ut TOX+uwTWiST+urREL could be a quality-adjusted value relative to TWiST (Q-TWiST). One hundred five patients with stage IV non-small cell lung cancer were included. Sixty-five were given chemotherapy, and the other 40 were not. The observation period was 2 years. Q-TWiST values for age, sex, PS, histology and chemotherapy were calculated. Their quantification was performed employing a regression tree type method. Chemotherapy contributed to Q-TWiST when ut approached 1 i.e., no side effect was supposed). When ut was less than 0.5, PS and sex had an appreciable role.
NASA Astrophysics Data System (ADS)
He, Yaqian; Bo, Yanchen; Chai, Leilei; Liu, Xiaolong; Li, Aihua
2016-08-01
Leaf Area Index (LAI) is an important parameter of vegetation structure. A number of moderate resolution LAI products have been produced in urgent need of large scale vegetation monitoring. High resolution LAI reference maps are necessary to validate these LAI products. This study used a geostatistical regression (GR) method to estimate LAI reference maps by linking in situ LAI and Landsat TM/ETM+ and SPOT-HRV data over two cropland and two grassland sites. To explore the discrepancies of employing different vegetation indices (VIs) on estimating LAI reference maps, this study established the GR models for different VIs, including difference vegetation index (DVI), normalized difference vegetation index (NDVI), and ratio vegetation index (RVI). To further assess the performance of the GR model, the results from the GR and Reduced Major Axis (RMA) models were compared. The results show that the performance of the GR model varies between the cropland and grassland sites. At the cropland sites, the GR model based on DVI provides the best estimation, while at the grassland sites, the GR model based on DVI performs poorly. Compared to the RMA model, the GR model improves the accuracy of reference LAI maps in terms of root mean square errors (RMSE) and bias.
Middle Micoene sandstone reservoirs of the Penal/Barrackpore field
Dyer, B.L. )
1991-03-01
The Penal/Barrackpore field was discovered in 1938 and is located in the southern subbasin of onshore Trinidad. The accumulation is one of a series of northeast-southwest trending en echelon middle Miocene anticlinal structures that was later accentuated by late Pliocene transpressional folding. Relative movement of the South American and Caribbean plates climaxed in the middle Miocene compressive tectonic event and produced an imbricate pattern of southward-facing basement-involved thrusts. Further compressive interaction between the plates in the late Pliocene produced a transpressive tectonic episode forming northwest-southeast oriented transcurrent faults, tear faults, basement thrust faults, lystric normal faults, and detached simple folds with infrequent diapiric cores. The middle Miocene Herrera and Karamat turbiditic sandstones are the primary reservoir rock in the subsurface anticline of the Penal/Barrackpore field. These turbidites were sourced from the north and deposited within the marls and clays of the Cipero Formation. Miocene and Pliocene deltaics and turbidites succeed the Cipero Formation vertically, lapping into preexisting Miocene highs. The late Pliocene transpression also coincides with the onset of oil migration along faults, diapirs, and unconformities from the Cretaceous Naparima Hill source. The Lengua Formation and the upper Forest clays are considered effective seals. Hydrocarbon trapping is structurally and stratigraphically controlled, with structure being the dominant trapping mechanism. Ultimate recoverable reserves for the field are estimated at 127.9 MMBo and 628.8 bcf. The field is presently owned and operated by the Trinidad and Tobago Oil Company Limited (TRINTOC).
Song, Rui; Kosorok, Michael; Zeng, Donglin; Zhao, Yingqi; Laber, Eric; Yuan, Ming
2015-01-01
As a new strategy for treatment which takes individual heterogeneity into consideration, personalized medicine is of growing interest. Discovering individualized treatment rules (ITRs) for patients who have heterogeneous responses to treatment is one of the important areas in developing personalized medicine. As more and more information per individual is being collected in clinical studies and not all of the information is relevant for treatment discovery, variable selection becomes increasingly important in discovering individualized treatment rules. In this article, we develop a variable selection method based on penalized outcome weighted learning through which an optimal treatment rule is considered as a classification problem where each subject is weighted proportional to his or her clinical outcome. We show that the resulting estimator of the treatment rule is consistent and establish variable selection consistency and the asymptotic distribution of the estimators. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data. PMID:25883393
Phase retrieval from noisy data based on minimization of penalized I-divergence.
Choi, Kerkil; Lanterman, Aaron D
2007-01-01
We study noise artifacts in phase retrieval based on minimization of an information-theoretic discrepancy measure called Csiszár's I-divergence. We specifically focus on adding Poisson noise to either the autocorrelation of the true image (as in astronomical imaging through turbulence) or the squared Fourier magnitudes of the true image (as in x-ray crystallography). Noise effects are quantified via various error metrics as signal-to-noise ratios vary. We propose penalized minimum I-divergence methods to suppress the observed noise artifacts. To avoid computational difficulties arising from the introduction of a penalty, we adapt Green's one-step-late approach for use in our minimum I-divergence framework.
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-01
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424
Sazykina, I G
2008-01-01
The effect of the special contingents of penal system on the tuberculosis epidemiological indicators in a particular region. The findings testify that under the conditions existed in the Orenburgskaya oblast the negative effect of the contingents of the sentence execution service of the Russian Federation on the tuberculosis epidemiological indicators is rather strong and besides is significantly higher in comparison with the average values typical for the Russian Federation. This kind of epidemiological analysis would be useful for functioning of the organizational and methodical services of other regional tuberculosis dispensers for the purpose of developing well-grounded additional managerial decisions.
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors
Woodard, Dawn B.; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials. PMID:24293988
GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING
Liu, Hongcheng; Yao, Tao; Li, Runze
2015-01-01
This paper is concerned with solving nonconvex learning problems with folded concave penalty. Despite that their global solutions entail desirable statistical properties, there lack optimization techniques that guarantee global optimality in a general setting. In this paper, we show that a class of nonconvex learning problems are equivalent to general quadratic programs. This equivalence facilitates us in developing mixed integer linear programming reformulations, which admit finite algorithms that find a provably global optimal solution. We refer to this reformulation-based technique as the mixed integer programming-based global optimization (MIPGO). To our knowledge, this is the first global optimization scheme with a theoretical guarantee for folded concave penalized nonconvex learning with the SCAD penalty (Fan and Li, 2001) and the MCP penalty (Zhang, 2010). Numerical results indicate a significant outperformance of MIPGO over the state-of-the-art solution scheme, local linear approximation, and other alternative solution techniques in literature in terms of solution quality. PMID:27141126
Statutory disclosure in article 280 of the Turkish Penal Code.
Büken, Erhan; Sahinoğlu, Serap; Büken, Nüket Ornek
2006-11-01
A new Turkish Penal Code came into effect on 1 June 2005. Article 280 concerns health care workers' failure to report a crime. This article removes the responsibility from health care workers to maintain confidentiality, but also removes patients' right to confidentiality. It provides for up to one year of imprisonment for a health care worker who, while on duty, finds an indication that a crime might have been committed by a patient and who does not inform the responsible authorities about it. This forces the health care worker to divulge the patient's confidential information. A patient who thinks he or she may be accused of a crime may therefore not seek medical help, which is the universal right of every person. The article is therefore contrary to medical ethics, oaths taken by physicians and nurses, and the understanding of patient confidentiality.
The Penal Code (Amendment) Act 1989 (Act A727), 1989.
1989-01-01
In 1989, Malaysia amended its penal code to provide that inducing an abortion is not an offense if the procedure is performed by a registered medical practitioner who has determined that continuation of the pregnancy would risk the life of the woman or damage her mental or physical health. Additional amendments include a legal description of the conditions which constitute the act of rape. Among these conditions is intercourse with or without consent with a woman under the age of 16. Malaysia fails to recognize rape within a marriage unless the woman is protected from her husband by judicial decree or is living separately from her husband according to Muslim custom. Rape is punishable by imprisonment for a term of 5-20 years and by whipping.
[Some consequences of the application of the new Swiss penal code on legal psychiatry].
Gasser, Jacques; Gravier, Bruno
2007-09-19
The new text of the Swiss penal code, which entered into effect at the beginning of 2007, has many incidences on the practice of the psychiatrists realizing expertises in the penal field or engaged in the application of legal measures imposing a treatment. The most notable consequences of this text are, on the one hand, a new definition of the concept of penal irresponsibility which is not necessarily any more related to a psychiatric diagnosis and, on the other hand, a new definition of legal constraints that justice can take to prevent new punishable acts and which appreciably modifies the place of the psychiatrists in the questions binding psychiatric care and social control.
Penal harm medicine: state tort remedies for delaying and denying health care to prisoners.
Vaughn, M S
1999-01-01
In prison and jail subcultures, custodial personnel are committed to the penal harm movement, which seeks to inflict pain on prisoners. Conversely, correctional medical personnel are sworn to the Hippocratic Oath and are committed to alleviating prisoners' suffering. The Hippocratic Oath is violated when correctional medical workers adopt penal harm mandates and inflict pain on prisoners. By analyzing lawsuits filed by prisoners under state tort law, this article shows how the penal harm movement co-opts some correctional medical employees into abandoning their treatment and healing mission, thus causing denial or delay of medical treatment to prisoners.
[Between law and psychiatry: homosexuality in the project of the Swiss penal code (1918)].
Delessert, Thierry
2005-01-01
In 1942 the Swiss penal code depenalises homosexual acts between agreeing adults under some conditions. The genesis of the penal article shows that it was constructed before the First World War and bears marks of the forensic theories of the turn of the century. Both by direct contacts and the authority of its eminent figures, Swiss psychiatry exerts an unquestionable influence on the depenalisation. The conceptualisation of homosexuality is also strongly influenced by the German psychiatric theories and discussed in reference to Germanic law. By the penal article, the Swiss lawyers and psychiatrists link the homosexual question with the determination of the irresponsibility of criminal mental patients and degeneracy.
XRA image segmentation using regression
NASA Astrophysics Data System (ADS)
Jin, Jesse S.
1996-04-01
Segmentation is an important step in image analysis. Thresholding is one of the most important approaches. There are several difficulties in segmentation, such as automatic selecting threshold, dealing with intensity distortion and noise removal. We have developed an adaptive segmentation scheme by applying the Central Limit Theorem in regression. A Gaussian regression is used to separate the distribution of background from foreground in a single peak histogram. The separation will help to automatically determine the threshold. A small 3 by 3 widow is applied and the modal of the local histogram is used to overcome noise. Thresholding is based on local weighting, where regression is used again for parameter estimation. A connectivity test is applied to the final results to remove impulse noise. We have applied the algorithm to x-ray angiogram images to extract brain arteries. The algorithm works well for single peak distribution where there is no valley in the histogram. The regression provides a method to apply knowledge in clustering. Extending regression for multiple-level segmentation needs further investigation.
Geodesic least squares regression on information manifolds
Verdoolaege, Geert
2014-12-05
We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply this to scaling laws in magnetic confinement fusion.
Zhang, F; Adeola, O
2017-02-01
The energy values of canola meal (CM), cottonseed meal (CSM), bakery meal (BM), and peanut flour meal (PFM) for broiler chickens were determined in 2 experiments with Ross 708 broiler chickens from d 21 to 28 posthatch. The birds were fed a standard broiler starter diet from d 0 to 21 posthatch. In each experiment, 320 birds were grouped by weight into 8 blocks of 5 cages with 8 birds per cage and assigned to 5 diets. Each experiment used a corn-soybean meal reference diet and 4 test diets in which test ingredients partly replaced the energy sources in the reference diet. The test diets in Exp. 1 consisted of 125 g CM, 250 g CM, 100 g CSM, or 200 g CSM/kg. In Exp. 2, the test diets consisted of 200 g BM, 400 g BM, 100 g PFM, or 200 g PFM/kg. The ileal digestible energy (IDE), metabolizable energy (ME), and nitrogen-corrected metabolizable energy (MEn) of all the test ingredients were determined by the regression method. The DM of CM, CSM, BM and PFM were 883, 878, 878, and 964 g/kg, respectively and the respective gross energies (GE) were 4,143, 4,237, 4,060, and 5,783 kcal/kg DM. In Exp. 1, the IDE were 2,132 and 2,197 kcal/kg DM for CM and CSM, respectively. The ME were 2,286 and 2,568 kcal/kg DM for CM and CSM, respectively. The MEn were 1,931 kcal/kg DM for CM and 2,078 kcal/ kg DM for CSM. In Exp. 2, IDE values were 3,412 kcal/kg DM for BM and 4,801 kcal/kg DM for PFM; ME values were 3,176 and 4,601 kcal/kg DM for BM and PFM, respectively, and the MEn values were 3,093 kcal/kg DM for BM and 4,112 kcal/kg DM for PFM. In conclusion, the current study showed that chickens can utilize a considerable amount of energy from these 4 ingredients, and also provided the energy values of CM, CSM, BM and PFM for broiler chickens.
NASA Astrophysics Data System (ADS)
Amin, Mohd Zaki M.; Islam, Tanvir; Ishak, Asnor M.
2014-10-01
The authors have applied an automated regression-based statistical method, namely, the automated statistical downscaling (ASD) model, to downscale and project the precipitation climatology in an equatorial climate region (Peninsular Malaysia). Five precipitation indices are, principally, downscaled and projected: mean monthly values of precipitation (Mean), standard deviation (STD), 90th percentile of rain day amount, percentage of wet days (Wet-day), and maximum number of consecutive dry days (CDD). The predictors, National Centers for Environmental Prediction (NCEP) products, are taken from the daily series reanalysis data, while the global climate model (GCM) outputs are from the Hadley Centre Coupled Model, version 3 (HadCM3) in A2/B2 emission scenarios and Third-Generation Coupled Global Climate Model (CGCM3) in A2 emission scenario. Meanwhile, the predictand data are taken from the arithmetically averaged rain gauge information and used as a baseline data for the evaluation. The results reveal, from the calibration and validation periods spanning a period of 40 years (1961-2000), the ASD model is capable to downscale the precipitation with reasonable accuracy. Overall, during the validation period, the model simulations with the NCEP predictors produce mean monthly precipitation of 6.18-6.20 mm/day (root mean squared error 0.78 and 0.82 mm/day), interpolated, respectively, on HadCM3 and CGCM3 grids, in contrast to 6.00 mm/day as observation. Nevertheless, the model suffers to perform reasonably well at the time of extreme precipitation and summer time, more specifically to generate the CDD and STD indices. The future projections of precipitation (2011-2099) exhibit that there would be an increase in the precipitation amount and frequency in most of the months. Taking the 1961-2000 timeline as the base period, overall, the annual mean precipitation would indicate a surplus projection by nearly 14~18 % under both GCM output cases (HadCM3 A2/B2 scenarios and
George: Gaussian Process regression
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel
2015-11-01
George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.
Wang, Shuang; Zhang, Yuchen; Dai, Wenrui; Lauter, Kristin; Kim, Miran; Tang, Yuzhe; Xiong, Hongkai; Jiang, Xiaoqian
2016-01-01
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual’s privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. Availability and implementation: Download HEALER at http://research.ucsd-dbmi.org/HEALER/ Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26446135
Goldsmith, Jeff; Kitago, Tomoko
2016-02-01
This work is concerned with understanding common population-level effects of stroke on motor control while accounting for possible subject-level idiosyncratic effects. Upper extremity motor control for each subject is assessed through repeated planar reaching motions from a central point to eight pre-specified targets arranged on a circle. We observe the kinematic data for hand position as a bivariate function of time for each reach. Our goal is to estimate the bivariate function-on-scalar regression with subject-level random functional effects while accounting for potential correlation in residual curves; covariates of interest are severity of motor impairment and target number. We express fixed effects and random effects using penalized splines, and allow for residual correlation using a Wishart prior distribution. Parameters are jointly estimated in a Bayesian framework, and we implement a computationally efficient approximation algorithm using variational Bayes. Simulations indicate that the proposed method yields accurate estimation and inference, and application results suggest that the effect of stroke on motor control has a systematic component observed across subjects.
Penalized weighted least-squares image reconstruction for dual energy X-ray transmission tomography.
Sukovic, P; Clinthorne, N H
2000-11-01
We present a dual-energy (DE) transmission computed tomography (CT) reconstruction method. It is statistically motivated and features nonnegativity constraints in the density domain. A penalized weighted least squares (PWLS) objective function has been chosen to handle the non-Poisson noise added by amorphous silicon (aSi:H) detectors. A Gauss-Seidel algorithm has been used to minimize the objective function. The behavior of the method in terms of bias/standard deviation tradeoff has been compared to that of a DE method that is based on filtered back projection (FBP). The advantages of the DE PWLS method are largest for high noise and/or low flux cases. Qualitative results suggest this as well. Also, the reconstructed images of an object with opaque regions are presented. Possible applications of the method are: attenuation correction for positron emission tomography (PET) images, various quantitative computed tomography (QCT) methods such as bone mineral densitometry (BMD), and the removal of metal streak artifacts.
Recent changes in Criminal Procedure Code and Indian Penal Code relevant to medical profession.
Agarwal, Swapnil S; Kumar, Lavlesh; Mestri, S C
2010-02-01
Some sections in Criminal Procedure Code and Indian Penal Code have a direct binding on medical practitioner. With changing times, few of them have been revised and these changes are presented in this article.
Peres, Maria Fernanda Tourinho; Nery Filho, Antônio
2002-01-01
Psychiatric information and practice are closely related with the field of criminal law, questioning classical penal law premises, such as responsibility and freewill. We have analyzed the articles related to mental health in Brazilian penal laws, since Código Criminal do Império do Brazil (Brazilian Empire criminal laws) from 1830. Our objective is to describe the structuring of a legal status for the mentally ill in Brazil, as well as the model of penal intervention in the lives of those considered as 'dangerous' and 'irresponsible'. In order to do so, we have analyzed not only specific articles on penal law, but also texts by specialized analysts. In addition, we have discussed the concepts that keep mentally-ill criminals in a rather ambiguous situation, i.e. legal irresponsibility, potential aggressiveness and safety policies.
Technology Transfer Automated Retrieval System (TEKTRAN)
In precision agriculture regression has been used widely to quality the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually makes the regression model suboptimal. In this study, a regression-kriging method was attemp...
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Efficient Drug-Pathway Association Analysis via Integrative Penalized Matrix Decomposition.
Li, Cong; Yang, Can; Hather, Greg; Liu, Ray; Zhao, Hongyu
2016-01-01
Traditional drug discovery practice usually follows the "one drug - one target" approach, seeking to identify drug molecules that act on individual targets, which ignores the systemic nature of human diseases. Pathway-based drug discovery recently emerged as an appealing approach to overcome this limitation. An important first step of such pathway-based drug discovery is to identify associations between drug molecules and biological pathways. This task has been made feasible by the accumulating data from high-throughput transcription and drug sensitivity profiling. In this paper, we developed "iPaD", an integrative Penalized Matrix Decomposition method to identify drug-pathway associations through jointly modeling of such high-throughput transcription and drug sensitivity data. A scalable bi-convex optimization algorithm was implemented and gave iPaD tremendous advantage in computational efficiency over current state-of-the-art method, which allows it to handle the ever-growing large-scale data sets that current method cannot afford to. On two widely used real data sets, iPaD also significantly outperformed the current method in terms of the number of validated drug-pathway associations that were identified. The Matlab code of our algorithm publicly available at http://licong-jason.github.io/iPaD/.
Ridge Regression: A Regression Procedure for Analyzing correlated Independent Variables
ERIC Educational Resources Information Center
Rakow, Ernest A.
1978-01-01
Ridge regression is a technique used to ameliorate the problem of highly correlated independent variables in multiple regression analysis. This paper explains the fundamentals of ridge regression and illustrates its use. (JKS)
Tang Shaojie; Tang Xiangyang
2012-09-15
Purposes: The suppression of noise in x-ray computed tomography (CT) imaging is of clinical relevance for diagnostic image quality and the potential for radiation dose saving. Toward this purpose, statistical noise reduction methods in either the image or projection domain have been proposed, which employ a multiscale decomposition to enhance the performance of noise suppression while maintaining image sharpness. Recognizing the advantages of noise suppression in the projection domain, the authors propose a projection domain multiscale penalized weighted least squares (PWLS) method, in which the angular sampling rate is explicitly taken into consideration to account for the possible variation of interview sampling rate in advanced clinical or preclinical applications. Methods: The projection domain multiscale PWLS method is derived by converting an isotropic diffusion partial differential equation in the image domain into the projection domain, wherein a multiscale decomposition is carried out. With adoption of the Markov random field or soft thresholding objective function, the projection domain multiscale PWLS method deals with noise at each scale. To compensate for the degradation in image sharpness caused by the projection domain multiscale PWLS method, an edge enhancement is carried out following the noise reduction. The performance of the proposed method is experimentally evaluated and verified using the projection data simulated by computer and acquired by a CT scanner. Results: The preliminary results show that the proposed projection domain multiscale PWLS method outperforms the projection domain single-scale PWLS method and the image domain multiscale anisotropic diffusion method in noise reduction. In addition, the proposed method can preserve image sharpness very well while the occurrence of 'salt-and-pepper' noise and mosaic artifacts can be avoided. Conclusions: Since the interview sampling rate is taken into account in the projection domain
Tang, Shaojie; Tang, Xiangyang
2012-01-01
Purposes: The suppression of noise in x-ray computed tomography (CT) imaging is of clinical relevance for diagnostic image quality and the potential for radiation dose saving. Toward this purpose, statistical noise reduction methods in either the image or projection domain have been proposed, which employ a multiscale decomposition to enhance the performance of noise suppression while maintaining image sharpness. Recognizing the advantages of noise suppression in the projection domain, the authors propose a projection domain multiscale penalized weighted least squares (PWLS) method, in which the angular sampling rate is explicitly taken into consideration to account for the possible variation of interview sampling rate in advanced clinical or preclinical applications. Methods: The projection domain multiscale PWLS method is derived by converting an isotropic diffusion partial differential equation in the image domain into the projection domain, wherein a multiscale decomposition is carried out. With adoption of the Markov random field or soft thresholding objective function, the projection domain multiscale PWLS method deals with noise at each scale. To compensate for the degradation in image sharpness caused by the projection domain multiscale PWLS method, an edge enhancement is carried out following the noise reduction. The performance of the proposed method is experimentally evaluated and verified using the projection data simulated by computer and acquired by a CT scanner. Results: The preliminary results show that the proposed projection domain multiscale PWLS method outperforms the projection domain single-scale PWLS method and the image domain multiscale anisotropic diffusion method in noise reduction. In addition, the proposed method can preserve image sharpness very well while the occurrence of “salt-and-pepper” noise and mosaic artifacts can be avoided. Conclusions: Since the interview sampling rate is taken into account in the projection domain
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive…
Penalizing Hospitals for Chronic Obstructive Pulmonary Disease Readmissions
Au, David H.
2014-01-01
In October 2014, the U.S. Centers for Medicare and Medicaid Services (CMS) will expand its Hospital Readmission Reduction Program (HRRP) to include chronic obstructive pulmonary disease (COPD). Under the new policy, hospitals with high risk-adjusted, 30-day all-cause unplanned readmission rates after an index hospitalization for a COPD exacerbation will be penalized with reduced reimbursement for the treatment of Medicare beneficiaries. In this perspective, we review the history of the HRRP, including the recent addition of COPD to the policy. We critically assess the use of 30-day all-cause COPD readmissions as an accountability measure, discussing potential benefits and then highlighting the substantial drawbacks and potential unintended consequences of the measure that could adversely affect providers, hospitals, and patients with COPD. We conclude by emphasizing the need to place the 30-day COPD readmission measure in the context of a reconceived model for postdischarge quality and review several frameworks that could help guide this process. PMID:24460431
Regression Analysis: Legal Applications in Institutional Research
ERIC Educational Resources Information Center
Frizell, Julie A.; Shippen, Benjamin S., Jr.; Luna, Andrew L.
2008-01-01
This article reviews multiple regression analysis, describes how its results should be interpreted, and instructs institutional researchers on how to conduct such analyses using an example focused on faculty pay equity between men and women. The use of multiple regression analysis will be presented as a method with which to compare salaries of…
A label field fusion bayesian model and its penalized maximum rand estimator for image segmentation.
Mignotte, Max
2010-06-01
This paper presents a novel segmentation approach based on a Markov random field (MRF) fusion model which aims at combining several segmentation results associated with simpler clustering models in order to achieve a more reliable and accurate segmentation result. The proposed fusion model is derived from the recently introduced probabilistic Rand measure for comparing one segmentation result to one or more manual segmentations of the same image. This non-parametric measure allows us to easily derive an appealing fusion model of label fields, easily expressed as a Gibbs distribution, or as a nonstationary MRF model defined on a complete graph. Concretely, this Gibbs energy model encodes the set of binary constraints, in terms of pairs of pixel labels, provided by each segmentation results to be fused. Combined with a prior distribution, this energy-based Gibbs model also allows for definition of an interesting penalized maximum probabilistic rand estimator with which the fusion of simple, quickly estimated, segmentation results appears as an interesting alternative to complex segmentation models existing in the literature. This fusion framework has been successfully applied on the Berkeley image database. The experiments reported in this paper demonstrate that the proposed method is efficient in terms of visual evaluation and quantitative performance measures and performs well compared to the best existing state-of-the-art segmentation methods recently proposed in the literature.
Gang, Grace J.; Stayman, J. Webster; Zbijewski, Wojciech; Siewerdsen, Jeffrey H.
2014-08-15
Purpose: Nonstationarity is an important aspect of imaging performance in CT and cone-beam CT (CBCT), especially for systems employing iterative reconstruction. This work presents a theoretical framework for both filtered-backprojection (FBP) and penalized-likelihood (PL) reconstruction that includes explicit descriptions of nonstationary noise, spatial resolution, and task-based detectability index. Potential utility of the model was demonstrated in the optimal selection of regularization parameters in PL reconstruction. Methods: Analytical models for local modulation transfer function (MTF) and noise-power spectrum (NPS) were investigated for both FBP and PL reconstruction, including explicit dependence on the object and spatial location. For FBP, a cascaded systems analysis framework was adapted to account for nonstationarity by separately calculating fluence and system gains for each ray passing through any given voxel. For PL, the point-spread function and covariance were derived using the implicit function theorem and first-order Taylor expansion according toFessler [“Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): Applications to tomography,” IEEE Trans. Image Process. 5(3), 493–506 (1996)]. Detectability index was calculated for a variety of simple tasks. The model for PL was used in selecting the regularization strength parameter to optimize task-based performance, with both a constant and a spatially varying regularization map. Results: Theoretical models of FBP and PL were validated in 2D simulated fan-beam data and found to yield accurate predictions of local MTF and NPS as a function of the object and the spatial location. The NPS for both FBP and PL exhibit similar anisotropic nature depending on the pathlength (and therefore, the object and spatial location within the object) traversed by each ray, with the PL NPS experiencing greater smoothing along directions with higher noise. The MTF of FBP
Relative risk regression analysis of epidemiologic data.
Prentice, R L
1985-11-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation
Liu, J. B.; Adeola, O.
2016-01-01
Forty-eight barrows with an average initial body weight of 25.5±0.3 kg were assigned to 6 dietary treatments arranged in a 3×2 factorial of 3 graded levels of P at 1.42, 2.07, or 2.72 g/kg, and 2 levels of casein at 0 or 50 g/kg to compare the estimates of true total tract digestibility (TTTD) of P in soybean meal (SBM) for pigs fed diets with or without casein supplementation. The SBM is the only source of P in diets without casein, and in the diet with added casein, 1.0 to 2.4 g/kg of total dietary P was supplied by SBM as dietary level of SBM increased. The experiment consisted of a 5-d adjustment period and a 5-d total collection period with ferric oxide as a maker to indicate the initiation and termination of fecal collection. There were interactive effects of casein supplementation and total dietary P level on the apparent total tract digestibility (ATTD) and retention of P (p<0.05). Dietary P intake, fecal P output, digested P and retained P were increased linearly with graded increasing levels of SBM in diets regardless of casein addition (p<0.01). Compared with diets without casein, there was a reduction in fecal P in the casein-supplemented diets, which led to increases in digested P, retained P, ATTD, and retention of P (p<0.01). Digested N, ATTD of N, retained N, and N retention were affected by the interaction of casein supplementation and dietary P level (p<0.05). Fecal N output, urinary N output, digested N, and retained N increased linearly with graded increasing levels of SBM for each type of diet (p<0.01). The estimates of TTTD of P in SBM, derived from the regression of daily digested P against daily P intake, for pigs fed diets without casein and with casein were calculated to be 37.3% and 38.6%, respectively. Regressing daily digested N against daily N intake, the TTTD of N in SBM were determined at 94.3% and 94.4% for diets without casein and with added casein, respectively. There was no difference in determined values of TTTD of P or N in
Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F
2016-01-01
In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.
Penalized differential pathway analysis of integrative oncogenomics studies.
van Wieringen, Wessel N; van de Wiel, Mark A
2014-04-01
Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature.
[Extramural research funds and penal law--status of legislation].
Ulsenheimer, Klaus
2005-04-01
After decades of smooth functioning, the cooperation of physicians and hospitals with the industry (much desired from the side of the government in the interest of clinical research) has fallen in legal discredit due to increasingly frequent criminal inquires and proceedings for unduly privileges, corruption, and embezzlement. The discredit is so severe that the industry funding for clinical research is diverted abroad to an increasing extent. The legal elements of embezzlement assume the intentional violation of the entrusted funds against the interest of the customer. Undue privileges occur when an official requests an advantage in exchange for a service (or is promised one or takes one) in his or somebody else's interest. The elements of corruption are then given when the receiver of the undue privilege provides an illegal service or takes a discretionary decision under the influence of the gratuity. The tension between the prohibition of undue privileges (as regulated by the penal law) and the granting of extramural funds (as regulated by the administrative law in academic institutions) can be reduced through a high degree of transparency and the start of control possibilities--public announcement and authorization by the officials--as well as through exact documentation and observance of the principles of separation of interests and moderation. With the anti-corruption law of 1997, it is possible to charge of corruption also physicians employed in private institutions. In contrast, physicians in private practice are not considered in the above criminal facts. They can only be charged of misdemeanor, or called to respond to the professional board, on the basis of the law that regulates advertising for medicinal products (Heilmittelwerbegesetz).
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
versions [19]. These techniques compare two programs with a large degree of syntactic similarity to prove that portions of one program version are equivalent to the other. Regression verification can be used for guaranteeing backward compatibility, and for showing behavioral equivalence in programs with syntactic differences, e.g., when a program is refactored to improve its performance, maintainability, or readability. Existing regression verification techniques leverage similarities between program versions by using abstraction and decomposition techniques to improve scalability of the analysis [10, 12, 19]. The abstractions and decomposition in the these techniques, e.g., summaries of unchanged code [12] or semantically equivalent methods [19], compute an over-approximation of the program behaviors. The equivalence checking results of these techniques are sound but not complete-they may characterize programs as not functionally equivalent when, in fact, they are equivalent. In this work we describe a novel approach that leverages the impact of the differences between two programs for scaling regression verification. We partition program behaviors of each version into (a) behaviors impacted by the changes and (b) behaviors not impacted (unimpacted) by the changes. Only the impacted program behaviors are used during equivalence checking. We then prove that checking equivalence of the impacted program behaviors is equivalent to checking equivalence of all program behaviors for a given depth bound. In this work we use symbolic execution to generate the program behaviors and leverage control- and data-dependence information to facilitate the partitioning of program behaviors. The impacted program behaviors are termed as impact summaries. The dependence analyses that facilitate the generation of the impact summaries, we believe, could be used in conjunction with other abstraction and decomposition based approaches, [10, 12], as a complementary reduction technique. An
Using Time-Series Regression to Predict Academic Library Circulations.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1984-01-01
Four methods were used to forecast monthly circulation totals in 15 midwestern academic libraries: dummy time-series regression, lagged time-series regression, simple average (straight-line forecasting), monthly average (naive forecasting). In tests of forecasting accuracy, dummy regression method and monthly mean method exhibited smallest average…
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Bayesian penalized log-likelihood ratio approach for dose response clinical trial studies.
Tang, Yuanyuan; Cai, Chunyan; Sun, Liangrui; He, Jianghua
2017-02-13
In literature, there are a few unified approaches to test proof of concept and estimate a target dose, including the multiple comparison procedure using modeling approach, and the permutation approach proposed by Klingenberg. We discuss and compare the operating characteristics of these unified approaches and further develop an alternative approach in a Bayesian framework based on the posterior distribution of a penalized log-likelihood ratio test statistic. Our Bayesian approach is much more flexible to handle linear or nonlinear dose-response relationships and is more efficient than the permutation approach. The operating characteristics of our Bayesian approach are comparable to and sometimes better than both approaches in a wide range of dose-response relationships. It yields credible intervals as well as predictive distribution for the response rate at a specific dose level for the target dose estimation. Our Bayesian approach can be easily extended to continuous, categorical, and time-to-event responses. We illustrate the performance of our proposed method with extensive simulations and Phase II clinical trial data examples.
NASA Astrophysics Data System (ADS)
Yang, Li; Ferrero, Andrea; Hagge, Rosalie J.; Badawi, Ramsey D.; Qi, Jinyi
2014-03-01
Detecting cancerous lesions is a major clinical application in emission tomography. In previous work, we have studied penalized maximum-likelihood (PML) image reconstruction for the detection task, where we used a multiview channelized Hotelling observer (mvCHO) to assess the lesion detectability in 3D images. It mimics the condition where a human observer examines three orthogonal views of a 3D image for lesion detection. We proposed a method to design a shift-variant quadratic penalty function to improve the detectability of lesions at unknown locations, and validated it using computer simulations. In this study we evaluated the bene t of the proposed penalty function for lesion detection using real data. A high-count real patient data with no identi able tumor inside the eld of view was used as the background data. A Na-22 point source was scanned in air at variable locations and the point source data were superimposed onto the patient data as arti cial lesions after being attenuated by the patient body. Independent Poisson noise was added to the high-count sinograms to generate 200 pairs of lesion-present and lesion-absent data sets, each mimicking a 5-minute scans. Lesion detectability was assessed using a multiview CHO and a human observer two alternative forced choice (2AFC) experiment. The results showed that the optimized penalty can improve lesion detection over the conventional quadratic penalty function.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Mei, Suyu; Zhang, Kun
2016-01-01
Protein-protein interaction (PPI) networks are naturally viewed as infrastructure to infer signalling pathways. The descriptors of signal events between two interacting proteins such as upstream/downstream signal flow, activation/inhibition relationship and protein modification are indispensable for inferring signalling pathways from PPI networks. However, such descriptors are not available in most cases as most PPI networks are seldom semantically annotated. In this work, we extend ℓ2-regularized logistic regression to the scenario of multi-label learning for predicting the activation/inhibition relationships in human PPI networks. The phenomenon that both activation and inhibition relationships exist between two interacting proteins is computationally modelled by multi-label learning framework. The problem of GO (gene ontology) sparsity is tackled by introducing the homolog knowledge as independent homolog instances. ℓ2-regularized logistic regression is accordingly adopted here to penalize the homolog noise and to reduce the computational complexity of the double-sized training data. Computational results show that the proposed method achieves satisfactory multi-label learning performance and outperforms the existing phenotype correlation method on the experimental data of Drosophila melanogaster. Several predictions have been validated against recent literature. The predicted activation/inhibition relationships in human PPI networks are provided in the supplementary file for further biomedical research. PMID:27819359
Dikaios, Nikolaos; Atkinson, David; Tudisca, Chiara; Purpura, Pierpaolo; Forster, Martin; Ahmed, Hashim; Beale, Timothy; Emberton, Mark; Punwani, Shonit
2017-03-01
The aim of this work is to compare Bayesian Inference for nonlinear models with commonly used traditional non-linear regression (NR) algorithms for estimating tracer kinetics in Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI). The algorithms are compared in terms of accuracy, and reproducibility under different initialization settings. Further it is investigated how a more robust estimation of tracer kinetics affects cancer diagnosis. The derived tracer kinetics from the Bayesian algorithm were validated against traditional NR algorithms (i.e. Levenberg-Marquardt, simplex) in terms of accuracy on a digital DCE phantom and in terms of goodness-of-fit (Kolmogorov-Smirnov test) on ROI-based concentration time courses from two different patient cohorts. The first cohort consisted of 76 men, 20 of whom had significant peripheral zone prostate cancer (any cancer-core-length (CCL) with Gleason>3+3 or any-grade with CCL>=4mm) following transperineal template prostate mapping biopsy. The second cohort consisted of 9 healthy volunteers and 24 patients with head and neck squamous cell carcinoma. The diagnostic ability of the derived tracer kinetics was assessed with receiver operating characteristic area under curve (ROC AUC) analysis. The Bayesian algorithm accurately recovered the ground-truth tracer kinetics for the digital DCE phantom consistently improving the Structural Similarity Index (SSIM) across the 50 different initializations compared to NR. For optimized initialization, Bayesian did not improve significantly the fitting accuracy on both patient cohorts, and it only significantly improved the ve ROC AUC on the HN population from ROC AUC=0.56 for the simplex to ROC AUC=0.76. For both cohorts, the values and the diagnostic ability of tracer kinetic parameters estimated with the Bayesian algorithm weren't affected by their initialization. To conclude, the Bayesian algorithm led to a more accurate and reproducible quantification of tracer kinetic
NASA Astrophysics Data System (ADS)
Thulin, Susanne; Hill, Michael J.; Held, Alex; Jones, Simon; Woodgate, Peter
2012-10-01
Development of predictive relationships between hyperspectral reflectance and the chemical constituents of grassland vegetation could support routine remote sensing assessment of feed quality in standing pastures. In this study, partial least squares regression (PLSR) and spectral transforms are used to derive predictive models for estimation of crude protein and digestibility (quality), and lignin and cellulose (non-digestible fractions) from field-based spectral libraries and chemical assays acquired from diverse pasture sites in Victoria, Australia between 2000 and 2002. The best predictive models for feed quality were obtained with continuum removal with spectral bands normalised to the depth of absorption features for digestibility (adjusted R2 = 0.82, root mean square error of prediction (RMSEP) = 3.94), and continuum removal with spectral bands normalised to the area of the absorption features for crude protein (adjusted R2 = 0.62, RMSEP = 3.18) and cellulose (adjusted R2 = 0.73, RMSEP = 2.37). The results for lignin were poorer with the best performing model based on the first derivative of log transformed reflectance (adjusted R2 = 0.44, RMSEP = 1.87). The best models were dominated by first derivative transforms, and by limiting the models to significant variables with "Jack-knifing". X-loading results identified wavelengths near or outside major absorption features as important predictors. This study showed that digestibility, as a broad measure of feed quality, could be effectively predicted from PLSR derived models of spectral reflectance derived from field spectroscopy. The models for cellulose and crude protein showed potential for qualitative assessment; however the results for lignin were poor. Implementation of spectral prediction models such as these, with hyperspectral sensors having a high signal to noise ratio, could deliver feed quality information to complement spatial biomass and growth data, and improve feed management for extensive
From moral theory to penal attitudes and back: a theoretically integrated modeling approach.
de Keijser, Jan W; van der Leeden, Rien; Jackson, Janet L
2002-01-01
From a moral standpoint, we would expect the practice of punishment to reflect a solid and commonly shared legitimizing framework. Several moral legal theories explicitly aim to provide such frameworks. Based on the theories of Retributivism, Utilitarianism, and Restorative Justice, this article first sets out to develop a theoretically integrated model of penal attitudes and then explores the extent to which Dutch judges' attitudes to punishment fit the model. Results indicate that penal attitudes can be measured in a meaningful way that is consistent with an integrated approach to moral theory. The general structure of penal attitudes among Dutch judges suggests a streamlined and pragmatic approach to legal punishment that is identifiably founded on the separate concepts central to moral theories of punishment. While Restorative Justice is frequently presented as an alternative paradigm, results show it to be smoothly incorporated within the streamlined approach.
Discriminative Elastic-Net Regularized Linear Regression.
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
2017-03-01
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical
2014-01-01
Background Genomic prediction is now widely recognized as an efficient, cost-effective and theoretically well-founded method for estimating breeding values using molecular markers spread over the whole genome. The prediction problem entails estimating the effects of all genes or chromosomal segments simultaneously and aggregating them to yield the predicted total genomic breeding value. Many potential methods for genomic prediction exist but have widely different relative computational costs, complexity and ease of implementation, with significant repercussions for predictive accuracy. We empirically evaluate the predictive performance of several contending regularization methods, designed to accommodate grouping of markers, using three synthetic traits of known accuracy. Methods Each of the competitor methods was used to estimate predictive accuracy for each of the three quantitative traits. The traits and an associated genome comprising five chromosomes with 10000 biallelic Single Nucleotide Polymorphic (SNP)-marker loci were simulated for the QTL-MAS 2012 workshop. The models were trained on 3000 phenotyped and genotyped individuals and used to predict genomic breeding values for 1020 unphenotyped individuals. Accuracy was expressed as the Pearson correlation between the simulated true and the estimated breeding values. Results All the methods produced accurate estimates of genomic breeding values. Grouping of markers did not clearly improve accuracy contrary to expectation. Selecting the penalty parameter with replicated 10-fold cross validation often gave better accuracy than using information theoretic criteria. Conclusions All the regularization methods considered produced satisfactory predictive accuracies for most practical purposes and thus deserve serious consideration in genomic prediction research and practice. Grouping markers did not enhance predictive accuracy for the synthetic data set considered. But other more sophisticated grouping schemes could
[New penal law from the point of forensic psychiatry and psychology].
Gierowski, J K; Szymusik, A
1998-01-01
The article discusses selected psychiatric and psychological problems in the new Penal Code in Poland. The authors focussed on the issues like definition of inaccountability and highly limited accountability, the principles of determining and realizing preventive measures, problems regarding competence of experts issuing opinions on mental health state of a perpetrator as well as the necessity of detention. The presented attempt at evaluation of the new penal regulations is a selection performed in a subjective way in accord with the authors' preferences, convictions and views.
[Art. 192 Polish Penal Code--lawyers comments and medical practice].
Swiatek, Barbara
2005-01-01
Art. 192 was enacted in the Polish Penal Code in 1997. Performance of a "medical intervention" without the patient's consent has been penalized. The binding norm is not generally acclaimed in medical circles, but lawyers seem very interested in them. In essence the regulation is clear in medical circles but teaching of lawyers has its differences. Inconsistent interpretation are concerned with objectivity of legal protection, extent of disposition of the legal norm and other determinant factors. The literature does not permit for an unambiguous and effective interpretation of this regulation. Urgent amendment of this law is necessary.
Penalized maximum likelihood reconstruction for x-ray differential phase-contrast tomography
Brendel, Bernhard; Teuffenbach, Maximilian von; Noël, Peter B.; Pfeiffer, Franz; Koehler, Thomas
2016-01-15
Purpose: The purpose of this work is to propose a cost function with regularization to iteratively reconstruct attenuation, phase, and scatter images simultaneously from differential phase contrast (DPC) acquisitions, without the need of phase retrieval, and examine its properties. Furthermore this reconstruction method is applied to an acquisition pattern that is suitable for a DPC tomographic system with continuously rotating gantry (sliding window acquisition), overcoming the severe smearing in noniterative reconstruction. Methods: We derive a penalized maximum likelihood reconstruction algorithm to directly reconstruct attenuation, phase, and scatter image from the measured detector values of a DPC acquisition. The proposed penalty comprises, for each of the three images, an independent smoothing prior. Image quality of the proposed reconstruction is compared to images generated with FBP and iterative reconstruction after phase retrieval. Furthermore, the influence between the priors is analyzed. Finally, the proposed reconstruction algorithm is applied to experimental sliding window data acquired at a synchrotron and results are compared to reconstructions based on phase retrieval. Results: The results show that the proposed algorithm significantly increases image quality in comparison to reconstructions based on phase retrieval. No significant mutual influence between the proposed independent priors could be observed. Further it could be illustrated that the iterative reconstruction of a sliding window acquisition results in images with substantially reduced smearing artifacts. Conclusions: Although the proposed cost function is inherently nonconvex, it can be used to reconstruct images with less aliasing artifacts and less streak artifacts than reconstruction methods based on phase retrieval. Furthermore, the proposed method can be used to reconstruct images of sliding window acquisitions with negligible smearing artifacts.
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
Allodji, Rodrigue S; Thiébaut, Anne C M; Leuraud, Klervi; Rage, Estelle; Henry, Stéphane; Laurier, Dominique; Bénichou, Jacques
2012-12-30
A broad variety of methods for measurement error (ME) correction have been developed, but these methods have rarely been applied possibly because their ability to correct ME is poorly understood. We carried out a simulation study to assess the performance of three error-correction methods: two variants of regression calibration (the substitution method and the estimation calibration method) and the simulation extrapolation (SIMEX) method. Features of the simulated cohorts were borrowed from the French Uranium Miners' Cohort in which exposure to radon had been documented from 1946 to 1999. In the absence of ME correction, we observed a severe attenuation of the true effect of radon exposure, with a negative relative bias of the order of 60% on the excess relative risk of lung cancer death. In the main scenario considered, that is, when ME characteristics previously determined as most plausible from the French Uranium Miners' Cohort were used both to generate exposure data and to correct for ME at the analysis stage, all three error-correction methods showed a noticeable but partial reduction of the attenuation bias, with a slight advantage for the SIMEX method. However, the performance of the three correction methods highly depended on the accurate determination of the characteristics of ME. In particular, we encountered severe overestimation in some scenarios with the SIMEX method, and we observed lack of correction with the three methods in some other scenarios. For illustration, we also applied and compared the proposed methods on the real data set from the French Uranium Miners' Cohort study.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Multiatlas Segmentation as Nonparametric Regression
Awate, Suyash P.; Whitaker, Ross T.
2015-01-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator’s convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems. PMID:24802528
C-arm perfusion imaging with a fast penalized maximum-likelihood approach
NASA Astrophysics Data System (ADS)
Frysch, Robert; Pfeiffer, Tim; Bannasch, Sebastian; Serowy, Steffen; Gugel, Sebastian; Skalej, Martin; Rose, Georg
2014-03-01
Perfusion imaging is an essential method for stroke diagnostics. One of the most important factors for a successful therapy is to get the diagnosis as fast as possible. Therefore our approach aims at perfusion imaging (PI) with a cone beam C-arm system providing perfusion information directly in the interventional suite. For PI the imaging system has to provide excellent soft tissue contrast resolution in order to allow the detection of small attenuation enhancement due to contrast agent in the capillary vessels. The limited dynamic range of flat panel detectors as well as the sparse sampling of the slow rotating C-arm in combination with standard reconstruction methods results in limited soft tissue contrast. We choose a penalized maximum-likelihood reconstruction method to get suitable results. To minimize the computational load, the 4D reconstruction task is reduced to several static 3D reconstructions. We also include an ordered subset technique with transitioning to a small number of subsets, which adds sharpness to the image with less iterations while also suppressing the noise. Instead of the standard multiplicative EM correction, we apply a Newton-based optimization to further accelerate the reconstruction algorithm. The latter optimization reduces the computation time by up to 70%. Further acceleration is provided by a multi-GPU implementation of the forward and backward projection, which fulfills the demands of cone beam geometry. In this preliminary study we evaluate this procedure on clinical data. Perfusion maps are computed and compared with reference images from magnetic resonance scans. We found a high correlation between both images.
Almost efficient estimation of relative risk regression
Fitzmaurice, Garrett M.; Lipsitz, Stuart R.; Arriaga, Alex; Sinha, Debajyoti; Greenberg, Caprice; Gawande, Atul A.
2014-01-01
Relative risks (RRs) are often considered the preferred measures of association in prospective studies, especially when the binary outcome of interest is common. In particular, many researchers regard RRs to be more intuitively interpretable than odds ratios. Although RR regression is a special case of generalized linear models, specifically with a log link function for the binomial (or Bernoulli) outcome, the resulting log-binomial regression does not respect the natural parameter constraints. Because log-binomial regression does not ensure that predicted probabilities are mapped to the [0,1] range, maximum likelihood (ML) estimation is often subject to numerical instability that leads to convergence problems. To circumvent these problems, a number of alternative approaches for estimating RR regression parameters have been proposed. One approach that has been widely studied is the use of Poisson regression estimating equations. The estimating equations for Poisson regression yield consistent, albeit inefficient, estimators of the RR regression parameters. We consider the relative efficiency of the Poisson regression estimator and develop an alternative, almost efficient estimator for the RR regression parameters. The proposed method uses near-optimal weights based on a Maclaurin series (Taylor series expanded around zero) approximation to the true Bernoulli or binomial weight function. This yields an almost efficient estimator while avoiding convergence problems. We examine the asymptotic relative efficiency of the proposed estimator for an increase in the number of terms in the series. Using simulations, we demonstrate the potential for convergence problems with standard ML estimation of the log-binomial regression model and illustrate how this is overcome using the proposed estimator. We apply the proposed estimator to a study of predictors of pre-operative use of beta blockers among patients undergoing colorectal surgery after diagnosis of colon cancer. PMID
Gubernator, Jerzy; Lipka, Dominik; Korycińska, Mariola; Kempińska, Katarzyna; Milczarek, Magdalena; Wietrzyk, Joanna; Hrynyk, Rafał; Barnert, Sabine; Süss, Regine; Kozubek, Arkadiusz
2014-01-01
Liposomes act as efficient drug carriers. Recently, epirubicin (EPI) formulation was developed using a novel EDTA ion gradient method for drug encapsulation. This formulation displayed very good stability and drug retention in vitro in a two-year long-term stability experiment. The cryo-TEM images show drug precipitate structures different than ones formed with ammonium sulfate method, which is usually used to encapsulate anthracyclines. Its pharmacokinetic properties and its efficacy in the human breast MDA-MB-231 cancer xenograft model were also determined. The liposomal EPI formulation is eliminated slowly with an AUC of 7.6487, while the free drug has an AUC of only 0.0097. The formulation also had a much higher overall antitumor efficacy than the free drug. PMID:24621591
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2010 CFR
2010-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL... Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If you... and to other provisions of law as applicable. (b) Logos. If you use the official logos, replicas...
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2011 CFR
2011-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL... Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If you... and to other provisions of law as applicable. (b) Logos. If you use the official logos, replicas...
7 CFR 1484.73 - Are Cooperators penalized for failing to make required contributions?
Code of Federal Regulations, 2013 CFR
2013-01-01
... 7 Agriculture 10 2013-01-01 2013-01-01 false Are Cooperators penalized for failing to make required contributions? 1484.73 Section 1484.73 Agriculture Regulations of the Department of Agriculture (Continued) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE EXPORT PROGRAMS PROGRAMS TO HELP...
36 CFR 1200.16 - Will I be penalized for misusing the official seals and logos?
Code of Federal Regulations, 2012 CFR
2012-07-01
... misusing the official seals and logos? 1200.16 Section 1200.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION GENERAL RULES OFFICIAL SEALS Penalties for Misuse of NARA Seals and Logos § 1200.16 Will I be penalized for misusing the official seals and logos? (a) Seals. (1) If...
The Role of the Environmental Health Specialist in the Penal and Correctional System
ERIC Educational Resources Information Center
Walker, Bailus, Jr.; Gordon, Theodore J.
1976-01-01
Implementing a health and hygiene program in penal systems necessitates coordinating the entire staff. Health specialists could participate in facility planning and management, policy formation, and evaluation of medical care, housekeeping, and food services. They could also serve as liaisons between correctional staff and governmental or…
A sexually transmitted infection screening algorithm based on semiparametric regression models
Li, Zhuokai; Liu, Hai; Tu, Wanzhu
2015-01-01
Sexually transmitted infections (STIs) with Chlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis are among the most common infectious diseases in the United States, disproportionately affecting young women. Because a significant portion of the infections present no symptoms, infection control relies primarily on disease screening. However, universal STI screening in a large population can be expensive. In this paper, we propose a semiparametric model-based screening algorithm. The model quantifies organism-specific infection risks in individual subjects, and account for the within-subject interdependence of the infection outcomes of different organisms and the serial correlations among the repeated assessments of the same organism. Bivariate thin-plate regression spline surfaces are incorporated to depict the concurrent influences of age and sexual partners on infection acquisition. Model parameters are estimated by using a penalized likelihood method. For inference, we develop a likelihood-based resampling procedure to compare the bivariate effect surfaces across outcomes. Simulation studies are conducted to evaluate the model fitting performance. A screening algorithm is developed using data collected from an epidemiological study of young women at increased risk of STIs. We present evidence that the three organisms have distinct age and partner effect patterns; for C. trachomatis, the partner effect is more pronounced in younger adolescents. Predictive performance of the proposed screening algorithm is assessed through a receiver operating characteristic (ROC) analysis. We show that the model-based screening algorithm has excellent accuracy in identifying individuals at increased risk, and thus can be used to assist STI screening in clinical practice. PMID:25900920
Spontaneous skin regression and predictors of skin regression in Thai scleroderma patients.
Foocharoen, Chingching; Mahakkanukrauh, Ajanee; Suwannaroj, Siraphop; Nanagara, Ratanavadee
2011-09-01
Skin tightness is a major clinical manifestation of systemic sclerosis (SSc). Importantly for both clinicians and patients, spontaneous regression of the fibrosis process has been documented. The purpose of this study is to identify the incidence and related clinical characteristics of spontaneous regression among Thai SSc patients. A historical cohort with 4 years of follow-up was performed among SSc patients over 15 years of age diagnosed with SSc between January 1, 2005 and December 31, 2006 in Khon Kaen, Thailand. The start date was the date of the first symptom and the end date was the date of the skin score ≤2. To estimate the respective probability of regression and to assess the associated factors, the Kaplan-Meier method and Cox regression analysis was used. One hundred seventeen cases of SSc were included with a female to male ratio of 1.5:1. Thirteen patients (11.1%) experienced regression. The incidence rate of spontaneous skin regression was 0.31 per 100 person-months and the average duration of SSc at the time of regression was 35.9±15.6 months (range, 15.7-60 months). The factors that negatively correlated with regression were (a) diffuse cutaneous type, (b) Raynaud's phenomenon, (c) esophageal dysmotility, and (d) colchicine treatment at onset with a respective hazard ratio (HR) of 0.19, 0.19, 0.26, and 0.20. By contrast, the factor that positively correlated with regression was active alveolitis with cyclophosphamide therapy at onset with an HR of 4.23 (95% CI, 1.23-14.10). After regression analysis, only Raynaud's phenomenon at onset and diffuse cutaneous type had a significantly negative correlation to regression. A spontaneous regression of the skin fibrosis process was not uncommon among Thai SSc patients. The factors suggesting a poor predictor for cutaneous manifestation were Raynaud's phenomenon, diffuse cutaneous type while early cyclophosphamide therapy might be related to a better skin outcome.
A new bivariate negative binomial regression model
NASA Astrophysics Data System (ADS)
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Incorporation of noise and prior images in penalized-likelihood reconstruction of sparse data
NASA Astrophysics Data System (ADS)
Ding, Yifu; Siewerdsen, Jeffrey H.; Stayman, J. Webster
2012-03-01
Many imaging scenarios involve a sequence of tomographic data acquisitions to monitor change over time - e.g., longitudinal studies of disease progression (tumor surveillance) and intraoperative imaging of tissue changes during intervention. Radiation dose imparted for these repeat acquisitions present a concern. Because such image sequences share a great deal of information between acquisitions, using prior image information from baseline scans in the reconstruction of subsequent scans can relax data fidelity requirements of follow-up acquisitions. For example, sparse data acquisitions, including angular undersampling and limited-angle tomography, limit exposure by reducing the number of acquired projections. Various approaches such as prior-image constrained compressed sensing (PICCS) have successfully incorporated prior images in the reconstruction of such sparse data. Another technique to limit radiation dose is to reduce the x-ray fluence per projection. However, many methods for reconstruction of sparse data do not include a noise model accounting for stochastic fluctuations in such low-dose measurements and cannot balance the differing information content of various measurements. In this paper, we present a prior-image, penalized-likelihood estimator (PI-PLE) that utilizes prior image information, compressed-sensing penalties, and a Poisson noise model for measurements. The approach is applied to a lung nodule surveillance scenario with sparse data acquired at low exposures to illustrate performance under cases of extremely limited data fidelity. The results show that PI-PLE is able to greatly reduce streak artifacts that otherwise arise from photon starvation, and maintain high-resolution anatomical features, whereas traditional approaches are subject to streak artifacts or lower-resolution images.
Chen, Y
2015-06-15
Purpose: To improve the quality of kV X-ray cone beam CT (CBCT) for use in radiotherapy delivery assessment and re-planning by using penalized likelihood (PL) iterative reconstruction and auto-segmentation accuracy of the resulting CBCTs as an image quality metric. Methods: Present filtered backprojection (FBP) CBCT reconstructions can be improved upon by PL reconstruction with image formation models and appropriate regularization constraints. We use two constraints: 1) image smoothing via an edge preserving filter, and 2) a constraint minimizing the differences between the reconstruction and a registered prior image. Reconstructions of prostate therapy CBCTs were computed with constraint 1 alone and with both constraints. The prior images were planning CTs(pCT) deformable-registered to the FBP reconstructions. Anatomy segmentations were done using atlas-based auto-segmentation (Elekta ADMIRE). Results: We observed small but consistent improvements in the Dice similarity coefficients of PL reconstructions over the FBP results, and additional small improvements with the added prior image constraint. For a CBCT with anatomy very similar in appearance to the pCT, we observed these changes in the Dice metric: +2.9% (prostate), +8.6% (rectum), −1.9% (bladder). For a second CBCT with a very different rectum configuration, we observed +0.8% (prostate), +8.9% (rectum), −1.2% (bladder). For a third case with significant lateral truncation of the field of view, we observed: +0.8% (prostate), +8.9% (rectum), −1.2% (bladder). Adding the prior image constraint raised Dice measures by about 1%. Conclusion: Efficient and practical adaptive radiotherapy requires accurate deformable registration and accurate anatomy delineation. We show here small and consistent patterns of improved contour accuracy using PL iterative reconstruction compared with FBP reconstruction. However, the modest extent of these results and the pattern of differences across CBCT cases suggest that
A Spline-Based Lack-Of-Fit Test for Independent Variable Effect in Poisson Regression.
Li, Chin-Shang; Tu, Wanzhu
2007-05-01
In regression analysis of count data, independent variables are often modeled by their linear effects under the assumption of log-linearity. In reality, the validity of such an assumption is rarely tested, and its use is at times unjustifiable. A lack-of-fit test is proposed for the adequacy of a postulated functional form of an independent variable within the framework of semiparametric Poisson regression models based on penalized splines. It offers added flexibility in accommodating the potentially non-loglinear effect of the independent variable. A likelihood ratio test is constructed for the adequacy of the postulated parametric form, for example log-linearity, of the independent variable effect. Simulations indicate that the proposed model performs well, and misspecified parametric model has much reduced power. An example is given.
Zheng, Qi; Peng, Limin
2016-01-01
Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212
Parametric modeling of quantile regression coefficient functions.
Frumento, Paolo; Bottai, Matteo
2016-03-01
Estimating the conditional quantiles of outcome variables of interest is frequent in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. For example, the coefficients for the median are generally different from those of the 10th centile. In this article, we describe an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. Goodness-of-fit measures and testing procedures are discussed, and the results of a simulation study are presented. We apply the method to analyze the data that motivated this work. The described method is implemented in the qrcm R package.
Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le
2015-01-01
Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data.
Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le
2015-01-01
Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589
Software Regression Verification
2013-12-11
we limit CBMC to unwind the recursion to k and run it. Now CBMC will explore all k level base cases. In Fig. 17 we can see the functions from Fig. 16...similar functions where one is some sort of an unwinding optimization of the other. Our previous method of k-induction will fail to prove the
Error bounds in cascading regressions
Karlinger, M.R.; Troutman, B.M.
1985-01-01
Cascading regressions is a technique for predicting a value of a dependent variable when no paired measurements exist to perform a standard regression analysis. Biases in coefficients of a cascaded-regression line as well as error variance of points about the line are functions of the correlation coefficient between dependent and independent variables. Although this correlation cannot be computed because of the lack of paired data, bounds can be placed on errors through the required properties of the correlation coefficient. The potential meansquared error of a cascaded-regression prediction can be large, as illustrated through an example using geomorphologic data. ?? 1985 Plenum Publishing Corporation.
Lumbar herniated disc: spontaneous regression
Yüksel, Kasım Zafer
2017-01-01
Background Low back pain is a frequent condition that results in substantial disability and causes admission of patients to neurosurgery clinics. To evaluate and present the therapeutic outcomes in lumbar disc hernia (LDH) patients treated by means of a conservative approach, consisting of bed rest and medical therapy. Methods This retrospective cohort was carried out in the neurosurgery departments of hospitals in Kahramanmaraş city and 23 patients diagnosed with LDH at the levels of L3−L4, L4−L5 or L5−S1 were enrolled. Results The average age was 38.4 ± 8.0 and the chief complaint was low back pain and sciatica radiating to one or both lower extremities. Conservative treatment was administered. Neurological examination findings, durations of treatment and intervals until symptomatic recovery were recorded. Laségue tests and neurosensory examination revealed that mild neurological deficits existed in 16 of our patients. Previously, 5 patients had received physiotherapy and 7 patients had been on medical treatment. The number of patients with LDH at the level of L3−L4, L4−L5, and L5−S1 were 1, 13, and 9, respectively. All patients reported that they had benefit from medical treatment and bed rest, and radiologic improvement was observed simultaneously on MRI scans. The average duration until symptomatic recovery and/or regression of LDH symptoms was 13.6 ± 5.4 months (range: 5−22). Conclusions It should be kept in mind that lumbar disc hernias could regress with medical treatment and rest without surgery, and there should be an awareness that these patients could recover radiologically. This condition must be taken into account during decision making for surgical intervention in LDH patients devoid of indications for emergent surgery. PMID:28119770
"They're Here to Follow Orders, Not to Think:" Some Notes on Academic Freedom in Penal Institutions.
ERIC Educational Resources Information Center
Arcard, Thomas E.; Watts-LaFontaine, Phyllis
1983-01-01
Addresses basic issues that arise when an instructor who teaches college-level courses to inmates in a penal institution reflects upon that teaching experience. Presents questions that must be examined by those teaching prison inmates. (JOW)
Regression-kriging for characterizing soils with remotesensing data
NASA Astrophysics Data System (ADS)
Ge, Yufeng; Thomasson, J. Alex; Sui, Ruixiu; Wooten, James
2011-09-01
In precision agriculture regression has been used widely to quantify the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually violates a basic assumption of regression: sample independence. In this study, a regression-kriging method was attempted in relating soil properties to the remote sensing image of a cotton field near Vance, Mississippi, USA. The regression-kriging model was developed and tested by using 273 soil samples collected from the field. The result showed that by properly incorporating the spatial correlation information of regression residuals, the regression-kriging model generally achieved higher prediction accuracy than the stepwise multiple linear regression model. Most strikingly, a 50% increase in prediction accuracy was shown in soil sodium concentration. Potential usages of regression-kriging in future precision agriculture applications include real-time soil sensor development and digital soil mapping.
On regression adjustment for the propensity score.
Vansteelandt, S; Daniel, R M
2014-10-15
Propensity scores are widely adopted in observational research because they enable adjustment for high-dimensional confounders without requiring models for their association with the outcome of interest. The results of statistical analyses based on stratification, matching or inverse weighting by the propensity score are therefore less susceptible to model extrapolation than those based solely on outcome regression models. This is attractive because extrapolation in outcome regression models may be alarming, yet difficult to diagnose, when the exposed and unexposed individuals have very different covariate distributions. Standard regression adjustment for the propensity score forms an alternative to the aforementioned propensity score methods, but the benefits of this are less clear because it still involves modelling the outcome in addition to the propensity score. In this article, we develop novel insights into the properties of this adjustment method. We demonstrate that standard tests of the null hypothesis of no exposure effect (based on robust variance estimators), as well as particular standardised effects obtained from such adjusted regression models, are robust against misspecification of the outcome model when a propensity score model is correctly specified; they are thus not vulnerable to the aforementioned problem of extrapolation. We moreover propose efficient estimators for these standardised effects, which retain a useful causal interpretation even when the propensity score model is misspecified, provided the outcome regression model is correctly specified.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-05
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application.
[Changes in legal status in the field of civil and penal doctor's liability].
Dukiet-Nagórska, T
1999-01-01
The author discusses the effects of organization changes in the range of health care, which took place in Poland, on the penal and civil liability of doctors. Her opinion refers also to the amendment of the penal code. The author emphasizes wider than before possibility of the doctor's liability for harm caused to a person or for the infringing of patient's rights. She also draws our attention to the description of offences pertaining to doctors, contained in the new panel code. The author proposes to work out the standards of procedure with the patient (based on the model of treatment standards)--according to the specific nature of particular sections of medicine or clinics. She expresses her conviction about the need to initiate changes in the legal status through the joint actions of doctors and lawyers.
[Development of a questionnaire to assess user satisfaction of a penal mediation program (CSM-P)].
Manzano Blanquez, Juan; Soria Verde, Miguel Angel; Armadans Tremolosa, Inmaculada
2008-08-01
The aim of the present study is to elaborate an instrument (CSM-P), valid for victims and aggressors, to assess satisfaction of individuals participating in a penal mediation program (VOM). The instrument was administered to a sample of 213 subjects, randomly chosen from the pool of participants in a VOM program of Catalonian Justice Department. Data analysis of the questionnaire shows an internal consistency of .88 (Cronbach's alpha). The dimensionality of the questionnaire is structured in a single factor that accounts for 61.45% of the variance. The instrument has proven its utility for assessing the satisfaction of the participants in a penal mediation program. Validation of the instrument in similar populations should be performed and it should be adapted to other contexts where assessing user satisfaction in a mediation program is necessary.
[Civil prodedure or penal procedure in the event of medical fault].
du Jardin, J
2004-01-01
The author specifies the rules and the vocabulary of the civil and penal procedures. He points out the characteristics of the medical act and the failures that a medical practitioner can be blamed for. He defines the notions of the duty of best efforts and the duty to achieve a specific result. The role of the expert is touched upon. The article is supplemented by significant case-law decisions and a list of recent textbooks.
Categorical Variables in Multiple Regression: Some Cautions.
ERIC Educational Resources Information Center
O'Grady, Kevin E.; Medoff, Deborah R.
1988-01-01
Limitations of dummy coding and nonsense coding as methods of coding categorical variables for use as predictors in multiple regression analysis are discussed. The combination of these approaches often yields estimates and tests of significance that are not intended by researchers for inclusion in their models. (SLD)
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Prediction of dynamical systems by symbolic regression.
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Moving the Bar: Transformations in Linear Regression.
ERIC Educational Resources Information Center
Miranda, Janet
The assumption that is most important to the hypothesis testing procedure of multiple linear regression is the assumption that the residuals are normally distributed, but this assumption is not always tenable given the realities of some data sets. When normal distribution of the residuals is not met, an alternative method can be initiated. As an…
Wang, Jing; Lu, Hongbing; Wen, Junhai; Liang, Zhengrong
2007-01-01
In this paper, we propose a novel multiscale penalized weighted least-squares (PWLS) method for restoration of low-dose computed tomography (CT) sinogram. The method utilizes wavelet transform for the multiscale or multi-resolution analysis on the sinogram. Specifically, the Mallat-Zhong's wavelet transform is applied to decompose the sinogram to different resolution levels. At each decomposed resolution level, a PWLS criterion is applied to restore the noise-contaminated wavelet coefficients, where the penalty is adaptive to each resolution scale and the weight is updated by an exponential relationship between the data variance and mean at each scale and location. The proposed PWLS method is based on the observations that (1) noise in CT sinogram after logarithm transform and calibration can be modeled as signal-dependent variables and the sample variance depends on the sample mean by an exponential relationship; and (2) noise reduction can be more effective when it is adaptive to different resolution levels. The effectiveness of the proposed multiscale PWLS method is validated by both computer simulations and experimental studies. The gain by multiscale approach over single-scale means is quantified by noise-resolution tradeoff measures. PMID:17946172
Kauhl, Boris; Heil, Jeanne; Hoebe, Christian J. P. A.; Schweikart, Jürgen; Krafft, Thomas; Dukers-Muijrers, Nicole H. T. M.
2017-01-01
Background Despite high vaccination coverage, pertussis incidence in the Netherlands is amongst the highest in Europe with a shifting tendency towards adults and elderly. Early detection of outbreaks and preventive actions are necessary to prevent severe complications in infants. Efficient pertussis control requires additional background knowledge about the determinants of testing and possible determinants of the current pertussis incidence. Therefore, the aim of our study is to examine the possibility of locating possible pertussis outbreaks using space-time cluster detection and to examine the determinants of pertussis testing and incidence using geographically weighted regression models. Methods We analysed laboratory registry data including all geocoded pertussis tests in the southern area of the Netherlands between 2007 and 2013. Socio-demographic and infrastructure-related population data were matched to the geo-coded laboratory data. The spatial scan statistic was applied to detect spatial and space-time clusters of testing, incidence and test-positivity. Geographically weighted Poisson regression (GWPR) models were then constructed to model the associations between the age-specific rates of testing and incidence and possible population-based determinants. Results Space-time clusters for pertussis incidence overlapped with space-time clusters for testing, reflecting a strong relationship between testing and incidence, irrespective of the examined age group. Testing for pertussis itself was overall associated with lower socio-economic status, multi-person-households, proximity to primary school and availability of healthcare. The current incidence in contradiction is mainly determined by testing and is not associated with a lower socioeconomic status. Discussion Testing for pertussis follows to an extent the general healthcare seeking behaviour for common respiratory infections, whereas the current pertussis incidence is largely the result of testing. More
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Uncertainty quantification in DIC with Kriging regression
NASA Astrophysics Data System (ADS)
Wang, Dezhi; DiazDelaO, F. A.; Wang, Weizhuo; Lin, Xiaoshan; Patterson, Eann A.; Mottershead, John E.
2016-03-01
A Kriging regression model is developed as a post-processing technique for the treatment of measurement uncertainty in classical subset-based Digital Image Correlation (DIC). Regression is achieved by regularising the sample-point correlation matrix using a local, subset-based, assessment of the measurement error with assumed statistical normality and based on the Sum of Squared Differences (SSD) criterion. This leads to a Kriging-regression model in the form of a Gaussian process representing uncertainty on the Kriging estimate of the measured displacement field. The method is demonstrated using numerical and experimental examples. Kriging estimates of displacement fields are shown to be in excellent agreement with 'true' values for the numerical cases and in the experimental example uncertainty quantification is carried out using the Gaussian random process that forms part of the Kriging model. The root mean square error (RMSE) on the estimated displacements is produced and standard deviations on local strain estimates are determined.
Survival analysis and Cox regression.
Benítez-Parejo, N; Rodríguez del Águila, M M; Pérez-Vicente, S
2011-01-01
The data provided by clinical trials are often expressed in terms of survival. The analysis of survival comprises a series of statistical analytical techniques in which the measurements analysed represent the time elapsed between a given exposure and the outcome of a certain event. Despite the name of these techniques, the outcome in question does not necessarily have to be either survival or death, and may be healing versus no healing, relief versus pain, complication versus no complication, relapse versus no relapse, etc. The present article describes the analysis of survival from both a descriptive perspective, based on the Kaplan-Meier estimation method, and in terms of bivariate comparisons using the log-rank statistic. Likewise, a description is provided of the Cox regression models for the study of risk factors or covariables associated to the probability of survival. These models are defined in both simple and multiple forms, and a description is provided of how they are calculated and how the postulates for application are checked - accompanied by illustrating examples with the shareware application R.
Parametric regression model for survival data: Weibull regression model as an example
2016-01-01
Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846
[Alternative imporisonment measures in the penal legislation of the Socialist Republic of Poland].
de Sanctis, S; Sclafani, F
1976-01-01
On April 19th, 1969 the People's Poland Parliament adopted a new penal legislation consisting of a Penal Code, a Code of Criminal Procedure and an executory Penal Code, which all came into force January 1st, 1970. The distinctive peculiarity of the above mentioned Codes, as the Authors have pointed out, is that they conform to the recent political, social and economic democratization of Poland and to the structure and dynamics of delinquency in this country. In examining the measures concerning educational rehabilitation provided for by the penal law in question, the Authors have started discussing questions like restriction on personal freedom, conditional suspension of criminal procedings, supervision of the convict who has been given the benefice of conditional suspension of the execution of sentence or of conditional release and supervision of the recidivist as well as points related to warranty and postjail care. Though these measures show some similarity to those provided for by bourgeois penal legislations, they result from a quite anthithetic ideology according to which punishment has to perform a double function of defence of working-class achievements as well as of an effective rehabilitation of the convict. A detailed analysis of the theoretical premises and of the actual carrying out of these measures shows, the Authors say, that though they represent only a further stage in the development of socio-juridical progress, they are put in conformity with the criminal policy's trends and the politico-ideological contents of a socialist government. They testify therefore an actual effort to make social rehabilitation of the convict possible in that they prevent the exertion of all negative psycho-pathogenic influences of jail environment on the convict, especially in case of slight crimes. The ultimate importance of these measures is that they can be considered an articulate attempt to promote actual social rehabilitation of the convict and therefore they are
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Reconstruction of difference in sequential CT studies using penalized likelihood estimation
Pourmorteza, A; Dang, H; Siewerdsen, J H; Stayman, J W
2016-01-01
Characterization of anatomical change and other differences is important in sequential computed tomography (CT) imaging, where a high-fidelity patient-specific prior image is typically present, but is not used, in the reconstruction of subsequent anatomical states. Here, we introduce a penalized likelihood (PL) method called reconstruction of difference (RoD) to directly reconstruct a difference image volume using both the current projection data and the (unregistered) prior image integrated into the forward model for the measurement data. The algorithm utilizes an alternating minimization to find both the registration and reconstruction estimates. This formulation allows direct control over the image properties of the difference image, permitting regularization strategies that inhibit noise and structural differences due to inconsistencies between the prior image and the current data.Additionally, if the change is known to be local, RoD allows local acquisition and reconstruction, as opposed to traditional model-based approaches that require a full support field of view (or other modifications). We compared the performance of RoD to a standard PL algorithm, in simulation studies and using test-bench cone-beam CT data. The performances of local and global RoD approaches were similar, with local RoD providing a significant computational speedup. In comparison across a range of data with differing fidelity, the local RoD approach consistently showed lower error (with respect to a truth image) than PL in both noisy data and sparsely sampled projection scenarios. In a study of the prior image registration performance of RoD, a clinically reasonable capture ranges were demonstrated. Lastly, the registration algorithm had a broad capture range and the error for reconstruction of CT data was 35% and 20% less than filtered back-projection for RoD and PL, respectively. The RoD has potential for delivering high-quality difference images in a range of sequential clinical
A tutorial on Bayesian Normal linear regression
NASA Astrophysics Data System (ADS)
Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens
2015-12-01
Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.
Crosby, Danielle A; Dowsett, Chantelle J; Gennetian, Lisa A; Huston, Aletha C
2010-09-01
We apply instrumental variables (IV) techniques to a pooled data set of employment-focused experiments to examine the relation between type of preschool childcare and subsequent externalizing problem behavior for a large sample of low-income children. To assess the potential usefulness of this approach for addressing biases that can confound causal inferences in child care research, we compare instrumental variables results with those obtained using ordinary least squares (OLS) regression. We find that our OLS estimates concur with prior studies showing small positive associations between center-based care and later externalizing behavior. By contrast, our IV estimates indicate that preschool-aged children with center care experience are rated by mothers and teachers as having fewer externalizing problems on entering elementary school than their peers who were not in child care as preschoolers. Findings are discussed in relation to the literature on associations between different types of community-based child care and children's social behavior, particularly within low-income populations. Moreover, we use this study to highlight the relative strengths and weaknesses of each analytic method for addressing causal questions in developmental research.
Maximum Likelihood, Profile Likelihood, and Penalized Likelihood: A Primer
Cole, Stephen R.; Chu, Haitao; Greenland, Sander
2014-01-01
The method of maximum likelihood is widely used in epidemiology, yet many epidemiologists receive little or no education in the conceptual underpinnings of the approach. Here we provide a primer on maximum likelihood and some important extensions which have proven useful in epidemiologic research, and which reveal connections between maximum likelihood and Bayesian methods. For a given data set and probability model, maximum likelihood finds values of the model parameters that give the observed data the highest probability. As with all inferential statistical methods, maximum likelihood is based on an assumed model and cannot account for bias sources that are not controlled by the model or the study design. Maximum likelihood is nonetheless popular, because it is computationally straightforward and intuitive and because maximum likelihood estimators have desirable large-sample properties in the (largely fictitious) case in which the model has been correctly specified. Here, we work through an example to illustrate the mechanics of maximum likelihood estimation and indicate how improvements can be made easily with commercial software. We then describe recent extensions and generalizations which are better suited to observational health research and which should arguably replace standard maximum likelihood as the default method. PMID:24173548
Consumer Education. An Introductory Unit for Inmates in Penal Institutions.
ERIC Educational Resources Information Center
Schmoele, Henry H.; And Others
This introductory consumer education curriculum outline contains materials designed to help soon-to-be-released prisoners to develop an awareness of consumer concerns and to better manage their family lives. Each of the four units provided includes lesson objectives, suggested contents, suggested teaching methods, handouts, and tests. The unit on…
Generalized Additive Models, Cubic Splines and Penalized Likelihood.
1987-05-22
in case control studies ). All models in the table include dummy variable to account for the matching. The first 3 lines of the table indicate that OA...Ausoc. Breslow, N. and Day, N. (1980). Statistical methods in cancer research, volume 1- the analysis of case - control studies . International agency
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Regressive evolution in Astyanax cavefish.
Jeffery, William R
2009-01-01
A diverse group of animals, including members of most major phyla, have adapted to life in the perpetual darkness of caves. These animals are united by the convergence of two regressive phenotypes, loss of eyes and pigmentation. The mechanisms of regressive evolution are poorly understood. The teleost Astyanax mexicanus is of special significance in studies of regressive evolution in cave animals. This species includes an ancestral surface dwelling form and many con-specific cave-dwelling forms, some of which have evolved their recessive phenotypes independently. Recent advances in Astyanax development and genetics have provided new information about how eyes and pigment are lost during cavefish evolution; namely, they have revealed some of the molecular and cellular mechanisms involved in trait modification, the number and identity of the underlying genes and mutations, the molecular basis of parallel evolution, and the evolutionary forces driving adaptation to the cave environment.
Transfer Learning Based on Logistic Regression
NASA Astrophysics Data System (ADS)
Paul, A.; Rottensteiner, F.; Heipke, C.
2015-08-01
In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Space-Time Event Sparse Penalization for Magneto-/Electroencephalography
Bolstad, Andrew; Van Veen, Barry; Nowak, Robert
2009-01-01
This article presents a new spatio-temporal method for M/EEG source reconstruction based on the assumption that only a small number of events, localized in space and/or time, are responsible for the measured signal. Each space-time event is represented using a basis function expansion which reflects the most relevant (or measurable) features of the signal. This model of neural activity leads naturally to a Bayesian likelihood function which balances the model fit to the data with the complexity of the model, where the complexity is related to the number of included events. A novel Expectation-Maximization algorithm which maximizes the likelihood function is presented. The new method is shown to be effective on several MEG simulations of neurological activity as well as data from a self-paced finger tapping experiment. PMID:19457366
Regression Segmentation for M³ Spinal Images.
Wang, Zhijie; Zhen, Xiantong; Tay, KengYeow; Osman, Said; Romano, Walter; Li, Shuo
2015-08-01
Clinical routine often requires to analyze spinal images of multiple anatomic structures in multiple anatomic planes from multiple imaging modalities (M(3)). Unfortunately, existing methods for segmenting spinal images are still limited to one specific structure, in one specific plane or from one specific modality (S(3)). In this paper, we propose a novel approach, Regression Segmentation, that is for the first time able to segment M(3) spinal images in one single unified framework. This approach formulates the segmentation task innovatively as a boundary regression problem: modeling a highly nonlinear mapping function from substantially diverse M(3) images directly to desired object boundaries. Leveraging the advancement of sparse kernel machines, regression segmentation is fulfilled by a multi-dimensional support vector regressor (MSVR) which operates in an implicit, high dimensional feature space where M(3) diversity and specificity can be systematically categorized, extracted, and handled. The proposed regression segmentation approach was thoroughly tested on images from 113 clinical subjects including both disc and vertebral structures, in both sagittal and axial planes, and from both MRI and CT modalities. The overall result reaches a high dice similarity index (DSI) 0.912 and a low boundary distance (BD) 0.928 mm. With our unified and expendable framework, an efficient clinical tool for M(3) spinal image segmentation can be easily achieved, and will substantially benefit the diagnosis and treatment of spinal diseases.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Demonstration of a Fiber Optic Regression Probe
NASA Technical Reports Server (NTRS)
Korman, Valentin; Polzin, Kurt A.
2010-01-01
The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Learning regulatory programs by threshold SVD regression.
Ma, Xin; Xiao, Luo; Wong, Wing Hung
2014-11-04
We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.
[The concept of insanity in Danish penal and mental health codes].
Brandt-Christensen, Mette; Bertelsen, Aksel
2010-04-26
The medical terms insanity and psychosis are used synonymously to describe a condition with substantial changes in the "total" personality and loss of realism. In the 10th revision of the diagnostic classification system ICD-10 (1994) the intention was to replace the term "psychosis" with "psychotic" to indicate the presence of hallucinations and delusions. However, in Danish legislation - most importantly in the penal code and the Mental Health Act - the term "insanity" is still in use. This difference has lead to diagnostic uncertainty, especially in clinical and forensic psychiatric practice.
1982-09-01
EMW-C-0682 Work Unit 2314H e,,’ wApproved for public release* DTIC 0 RYLAND RESEARCH, INC. ELECTE,Santa Barbara, California SDEC 7 193 t D_ _ _ _o_...PENAL SYSTEM ALTERNATIVES IN CRISIS RELOCATION PLANNING FINAL REPORT 21 SEPTEMBER 1982 BY JOHN ERLAND STEEN HARVEY RYLAND- Accession or * NTIS G7P&I DTIC ...Expansion ratios from 103to 11 have been observed. The visual blanking effect is extremely disorienting and will make any task requiring vision extremely
Godoy, Roberto L M
2009-01-01
The present essay is intended to oppose to the bipartite thesis of the capacity of penal culpability ("to be able to understand the criminality of the act or to be able to direct the actions"), a unitary thesis in which it seems biopsychologically impossible to direct the behaviour towards an object that hasn't been previously understood, nor a complete divorce of action from understanding (as it results from a maximum integration of the intellective, volitive and affective spheres of a dynamic psyche).
NASA Astrophysics Data System (ADS)
Wu, Chunhung
2016-04-01
Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall
Cactus: An Introduction to Regression
ERIC Educational Resources Information Center
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Multiple Regression: A Leisurely Primer.
ERIC Educational Resources Information Center
Daniel, Larry G.; Onwuegbuzie, Anthony J.
Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…
Weighting Regressions by Propensity Scores
ERIC Educational Resources Information Center
Freedman, David A.; Berk, Richard A.
2008-01-01
Regressions can be weighted by propensity scores in order to reduce bias. However, weighting is likely to increase random error in the estimates, and to bias the estimated standard errors downward, even when selection mechanisms are well understood. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. If…
Quantile Regression with Censored Data
ERIC Educational Resources Information Center
Lin, Guixian
2009-01-01
The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…
Montgomery, Katherine L; Vaughn, Michael G; Thompson, Sanna J; Howard, Matthew O
2013-11-01
Research on juvenile offenders has largely treated this population as a homogeneous group. However, recent findings suggest that this at-risk population may be considerably more heterogeneous than previously believed. This study compared mixture regression analyses with standard regression techniques in an effort to explain how known factors such as distress, trauma, and personality are associated with drug abuse among juvenile offenders. Researchers recruited 728 juvenile offenders from Missouri juvenile correctional facilities for participation in this study. Researchers investigated past-year substance use in relation to the following variables: demographic characteristics (gender, ethnicity, age, familial use of public assistance), antisocial behavior, and mental illness symptoms (psychopathic traits, psychiatric distress, and prior trauma). Results indicated that standard and mixed regression approaches identified significant variables related to past-year substance use among this population; however, the mixture regression methods provided greater specificity in results. Mixture regression analytic methods may help policy makers and practitioners better understand and intervene with the substance-related subgroups of juvenile offenders.
ERIC Educational Resources Information Center
California State Office of the Attorney General, Sacramento.
This handbook was prepared to ensure that, as required by section 626.1 of the California Penal Code in 1984, "students, parents, and all school officials and employees have access to a concise, easily understandable summary of California penal and civil law pertaining to crimes committed against persons or property on school grounds."…
ERIC Educational Resources Information Center
Torrence, John Thomas
Excluding military installations, training programs in state and federal penal institutions were surveyed, through a mailed checklist, to test the hypotheses that (1) training programs in penal institutions were not related to the unfilled job openings by major occupations in the United States, and (2) that training programs reported would have a…
Code of Federal Regulations, 2011 CFR
2011-10-01
... 45 Public Welfare 2 2011-10-01 2011-10-01 false Can a family be penalized if a parent refuses to... Provisions Addressing Individual Responsibility? § 261.15 Can a family be penalized if a parent refuses to... based on an individual's refusal to engage in required work if the individual is a single...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 45 Public Welfare 2 2010-10-01 2010-10-01 false Can a family be penalized if a parent refuses to... Provisions Addressing Individual Responsibility? § 261.15 Can a family be penalized if a parent refuses to... based on an individual's refusal to engage in required work if the individual is a single...
Independent motion detection with a rival penalized adaptive particle filter
NASA Astrophysics Data System (ADS)
Becker, Stefan; Hübner, Wolfgang; Arens, Michael
2014-10-01
Aggregation of pixel based motion detection into regions of interest, which include views of single moving objects in a scene is an essential pre-processing step in many vision systems. Motion events of this type provide significant information about the object type or build the basis for action recognition. Further, motion is an essential saliency measure, which is able to effectively support high level image analysis. When applied to static cameras, background subtraction methods achieve good results. On the other hand, motion aggregation on freely moving cameras is still a widely unsolved problem. The image flow, measured on a freely moving camera is the result from two major motion types. First the ego-motion of the camera and second object motion, that is independent from the camera motion. When capturing a scene with a camera these two motion types are adverse blended together. In this paper, we propose an approach to detect multiple moving objects from a mobile monocular camera system in an outdoor environment. The overall processing pipeline consists of a fast ego-motion compensation algorithm in the preprocessing stage. Real-time performance is achieved by using a sparse optical flow algorithm as an initial processing stage and a densely applied probabilistic filter in the post-processing stage. Thereby, we follow the idea proposed by Jung and Sukhatme. Normalized intensity differences originating from a sequence of ego-motion compensated difference images represent the probability of moving objects. Noise and registration artefacts are filtered out, using a Bayesian formulation. The resulting a posteriori distribution is located on image regions, showing strong amplitudes in the difference image which are in accordance with the motion prediction. In order to effectively estimate the a posteriori distribution, a particle filter is used. In addition to the fast ego-motion compensation, the main contribution of this paper is the design of the probabilistic
RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION.
Zhang, Zhongyang; Lange, Kenneth; Ophoff, Roel; Sabatti, Chiara
2010-12-01
Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18-29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way.We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost.
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
An Importance Sampling EM Algorithm for Latent Regression Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2007-01-01
Reporting methods used in large-scale assessments such as the National Assessment of Educational Progress (NAEP) rely on latent regression models. To fit the latent regression model using the maximum likelihood estimation technique, multivariate integrals must be evaluated. In the computer program MGROUP used by the Educational Testing Service for…
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications.
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.
Contrasting OLS and Quantile Regression Approaches to Student "Growth" Percentiles
ERIC Educational Resources Information Center
Castellano, Katherine Elizabeth; Ho, Andrew Dean
2013-01-01
Regression methods can locate student test scores in a conditional distribution, given past scores. This article contrasts and clarifies two approaches to describing these locations in terms of readily interpretable percentile ranks or "conditional status percentile ranks." The first is Betebenner's quantile regression approach that results in…
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
Nasrollahi, S M; Imani, M; Zebeli, Q
2015-12-01
A meta-analysis of the effect of forage particle size (FPS) on nutrient intake, digestibility, and milk production of dairy cattle was conducted using published data from the literature (1998-2014). Meta-regression was used to evaluate the effect of forage level, source, and preservation method on heterogeneity of the results for FPS. A total of 46 papers and 28 to 91 trials (each trial consisting of 2 treatment means) that reported changes in FPS in the diet of dairy cattle were identified. Estimated effect sizes of FPS were calculated on nutrient intake, nutrient digestibility, and milk production and composition. Intakes of dry matter and neutral detergent fiber increased with decreasing FPS (0.527 and 0.166kg/d, respectively) but neutral detergent fiber digestibility decreased (0.6%) with decreasing FPS. Heterogeneity (amount of variation among studies) was significant for all intake and digestibility parameters and the improvement in feed intake only occurred with decreasing FPS for diets containing a high level of forage (>50%). Also, the improvement in dry matter intake due to lowering FPS occurred for diets containing silage but not hay. Digestibility of dry matter increased with decreasing FPS when the forage source of the diet was not corn. Milk production consistently increased (0.541kg/d; heterogeneity=19%) and milk protein production increased (0.02kg/d) as FPS decreased, but FCM was not affected by FPS. Likewise, milk fat percentage decreased (0.058%) with decreasing FPS. The heterogeneity of milk parameters (including fat-corrected milk, milk fat, and milk protein), other than milk production, was also significant. Decreasing FPS in high-forage diets (>50%) increased milk protein production by 0.027%. Decreasing FPS increased milk protein content in corn forage-based diets and milk fat and protein percentage in hay-based diets. In conclusion, FPS has the potential to affect feed intake and milk production of dairy cows, but its effects depend upon
Sex offender punishment and the persistence of penal harm in the U.S.
Leon, Chrysanthi S
2011-01-01
The U.S. has dramatically revised its approach to punishment in the last several decades. In particular, people convicted of sex crimes have experienced a remarkable expansion in social control through a wide-range of post-conviction interventions. While this expansion may be largely explained by general punishment trends, there appear to be unique factors that have prevented other penal reforms from similarly modulating sex offender punishment. In part, this continuation of a "penal harm" approach to sex offenders relates to the past under-valuing of sexual victimization. In the "bad old days," the law and its agents sent mixed messages about sexual violence and sexual offending. Some sexual offending was mere nuisance, some was treatable, and a fraction "deserved" punishment equivalent to other serious criminal offending. In contrast, today's sex offender punishment schemes rarely distinguish formally among gradations of harm or dangerousness. After examining incarceration trends, this article explores the historical context of the current broad brush approach and reviews the unintended consequences. Altogether, this article reinforces the need to return to differentiation among sex offenders, but differentiation based on science and on the experience-based, guided discretion of experts in law enforcement, corrections, and treatment.
Hierarchical regression for epidemiologic analyses of multiple exposures
Greenland, S.
1994-11-01
Many epidemiologic investigations are designed to study the effects of multiple exposures. Most of these studies are analyzed either by fitting a risk-regression model with all exposures forced in the model, or by using a preliminary-testing algorithm, such as stepwise regression, to produce a smaller model. Research indicates that hierarchical modeling methods can outperform these conventional approaches. These methods are reviewed and compared to two hierarchical methods, empirical-Bayes regression and a variant here called {open_quotes}semi-Bayes{close_quotes} regression, to full-model maximum likelihood and to model reduction by preliminary testing. The performance of the methods in a problem of predicting neonatal-mortality rates are compared. Based on the literature to date, it is suggested that hierarchical methods should become part of the standard approaches to multiple-exposure studies. 35 refs., 1 fig., 1 tab.
Robust regression with asymmetric heavy-tail noise distributions.
Takeuchi, Ichiro; Bengio, Yoshua; Kanamori, Takafumi
2002-10-01
In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate
NASA Astrophysics Data System (ADS)
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
2012-01-01
Background Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. Methods We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). Results The elastic net, lasso, adaptive lasso and the
ERIC Educational Resources Information Center
Semmens, Bob, Ed.; Cook, Sandy, Ed.
This document contains 19 papers presented at an international forum on education in penal systems. The following papers are included: "Burning" (Craig W.J. Minogue); "The Acquisition of Cognitive Skills as a Means of Recidivism Reduction: A Former Prisoner's Perspective" (Trevor Darryl Doherty); "CEA (Correctional…
ERIC Educational Resources Information Center
Klofas, John; Duffee, David E.
1981-01-01
Reexamines the assumptions of the change grid regarding the channeling of masses of clients into change strategies programs. Penal organizations specifically select and place clients so that programs remain stable, rather than sequence programs to meet the needs of clients. (Author)
40 CFR 33.410 - Can a recipient be penalized for failing to meet its fair share objectives?
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 1 2013-07-01 2013-07-01 false Can a recipient be penalized for... UNITED STATES ENVIRONMENTAL PROTECTION AGENCY PROGRAMS Fair Share Objectives § 33.410 Can a recipient be... faith efforts requirements described in subpart C of this part....
40 CFR 33.410 - Can a recipient be penalized for failing to meet its fair share objectives?
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 1 2011-07-01 2011-07-01 false Can a recipient be penalized for... UNITED STATES ENVIRONMENTAL PROTECTION AGENCY PROGRAMS Fair Share Objectives § 33.410 Can a recipient be... faith efforts requirements described in subpart C of this part....
40 CFR 33.410 - Can a recipient be penalized for failing to meet its fair share objectives?
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 1 2010-07-01 2010-07-01 false Can a recipient be penalized for... UNITED STATES ENVIRONMENTAL PROTECTION AGENCY PROGRAMS Fair Share Objectives § 33.410 Can a recipient be... faith efforts requirements described in subpart C of this part....
Incidence of AIDS cases in Spanish penal facilities through the capture-recapture method, 2000.
Acin, E; Gómez, P; Hernando, P; Corella, I
2003-09-01
Three available sources of information used in the surveillance of AIDS in Spanish prisons were used to carry out a capture-recapture study. Results showed the register of AIDS cases (RCS) considerably underestimates the incidence of this disease in prisons as it covers only 50% of cases. This study highlights the need to use additional sources to the RCS to evaluate the real incidence of AIDS in prisons in Spain.
Learning a Nonnegative Sparse Graph for Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung
2015-09-01
Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
Linear regression analysis of survival data with missing censoring indicators
Wang, Qihua
2010-01-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Regression analysis of cytopathological data
Whittemore, A.S.; McLarty, J.W.; Fortson, N.; Anderson, K.
1982-12-01
Epithelial cells from the human body are frequently labelled according to one of several ordered levels of abnormality, ranging from normal to malignant. The label of the most abnormal cell in a specimen determines the score for the specimen. This paper presents a model for the regression of specimen scores against continuous and discrete variables, as in host exposure to carcinogens. Application to data and tests for adequacy of model fit are illustrated using sputum specimens obtained from a cohort of former asbestos workers.
Robust regression on noisy data for fusion scaling laws
Verdoolaege, Geert
2014-11-15
We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
Semiparametric regression in capture-recapture modeling.
Gimenez, O; Crainiceanu, C; Barbraud, C; Jenouvrier, S; Morgan, B J T
2006-09-01
Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in survival may be explained by incorporating relevant covariates. We propose nonparametric and semiparametric regression methods for estimating survival in capture-recapture models. A fully Bayesian approach using Markov chain Monte Carlo simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adélie.
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
Identification of high leverage points in binary logistic regression
NASA Astrophysics Data System (ADS)
Fitrianto, Anwar; Wendy, Tham
2016-10-01
Leverage points are those which measures uncommon observations in x space of regression diagnostics. Detection of high leverage points plays a vital role because it is responsible in masking outlier. In regression, high observations which made at extreme in the space of explanatory variables and they are far apart from the average of the data are called as leverage points. In this project, a method for identification of high leverage point in logistic regression was shown using numerical example. We investigate the effects of high leverage point in the logistic regression model. The comparison of the result in the model with and without leverage model is being discussed. Some graphical analyses based on the result of the analysis are presented. We found that the presence of HLP have effect on the hii, estimated probability, estimated coefficients, p-value of variable, odds ratio and regression equation.
Multivariate semiparametric spatial methods for imaging data.
Chen, Huaihou; Cao, Guanqun; Cohen, Ronald A
2017-04-01
Univariate semiparametric methods are often used in modeling nonlinear age trajectories for imaging data, which may result in efficiency loss and lower power for identifying important age-related effects that exist in the data. As observed in multiple neuroimaging studies, age trajectories show similar nonlinear patterns for the left and right corresponding regions and for the different parts of a big organ such as the corpus callosum. To incorporate the spatial similarity information without assuming spatial smoothness, we propose a multivariate semiparametric regression model with a spatial similarity penalty, which constrains the variation of the age trajectories among similar regions. The proposed method is applicable to both cross-sectional and longitudinal region-level imaging data. We show the asymptotic rates for the bias and covariance functions of the proposed estimator and its asymptotic normality. Our simulation studies demonstrate that by borrowing information from similar regions, the proposed spatial similarity method improves the efficiency remarkably. We apply the proposed method to two neuroimaging data examples. The results reveal that accounting for the spatial similarity leads to more accurate estimators and better functional clustering results for visualizing brain atrophy pattern.Functional clustering; Longitudinal magnetic resonance imaging (MRI); Penalized B-splines; Region of interest (ROI); Spatial penalty.
Quantile Regression Models for Current Status Data.
Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen
2016-11-01
Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging.
Collaborative Regression-based Anatomical Landmark Detection
Gao, Yaozong; Shen, Dinggang
2015-01-01
Anatomical landmark detection plays an important role in medical image analysis, e.g., for registration, segmentation and quantitative analysis. Among various existing methods for landmark detection, regression-based methods recently have drawn much attention due to robustness and efficiency. In such methods, landmarks are localized through voting from all image voxels, which is completely different from classification-based methods that use voxel-wise classification to detect landmarks. Despite robustness, the accuracy of regression-based landmark detection methods is often limited due to 1) inclusion of uninformative image voxels in the voting procedure, and 2) lack of effective ways to incorporate inter-landmark spatial dependency into the detection step. In this paper, we propose a collaborative landmark detection framework to address these limitations. The concept of collaboration is reflected in two aspects. 1) Multi-resolution collaboration. A multi-resolution strategy is proposed to hierarchically localize landmarks by gradually excluding uninformative votes from faraway voxels. Moreover, for the informative voxels near the landmark, a spherical sampling strategy is also designed in the training stage to improve their prediction accuracy. 2) Inter-landmark collaboration. A confidence-based landmark detection strategy is proposed to improve the detection accuracy of “difficult-to-detect” landmarks by using spatial guidance from “easy-to-detect” landmarks. To evaluate our method, we conducted experiments extensively on three datasets for detecting prostate landmarks and head & neck landmarks in computed tomography (CT) images, and also dental landmarks in cone beam computed tomography (CBCT) images. The results show the effectiveness of our collaborative landmark detection framework in improving landmark detection accuracy, compared to other state-of-the-art methods. PMID:26579736
Collaborative regression-based anatomical landmark detection
NASA Astrophysics Data System (ADS)
Gao, Yaozong; Shen, Dinggang
2015-12-01
Anatomical landmark detection plays an important role in medical image analysis, e.g. for registration, segmentation and quantitative analysis. Among the various existing methods for landmark detection, regression-based methods have recently attracted much attention due to their robustness and efficiency. In these methods, landmarks are localised through voting from all image voxels, which is completely different from the classification-based methods that use voxel-wise classification to detect landmarks. Despite their robustness, the accuracy of regression-based landmark detection methods is often limited due to (1) the inclusion of uninformative image voxels in the voting procedure, and (2) the lack of effective ways to incorporate inter-landmark spatial dependency into the detection step. In this paper, we propose a collaborative landmark detection framework to address these limitations. The concept of collaboration is reflected in two aspects. (1) Multi-resolution collaboration. A multi-resolution strategy is proposed to hierarchically localise landmarks by gradually excluding uninformative votes from faraway voxels. Moreover, for informative voxels near the landmark, a spherical sampling strategy is also designed at the training stage to improve their prediction accuracy. (2) Inter-landmark collaboration. A confidence-based landmark detection strategy is proposed to improve the detection accuracy of ‘difficult-to-detect’ landmarks by using spatial guidance from ‘easy-to-detect’ landmarks. To evaluate our method, we conducted experiments extensively on three datasets for detecting prostate landmarks and head & neck landmarks in computed tomography images, and also dental landmarks in cone beam computed tomography images. The results show the effectiveness of our collaborative landmark detection framework in improving landmark detection accuracy, compared to other state-of-the-art methods.
NASA Astrophysics Data System (ADS)
Gang, G. J.; Siewerdsen, J. H.; Stayman, J. W.
2016-03-01
Purpose: This work applies task-driven optimization to design CT tube current modulation and directional regularization in penalized-likelihood (PL) reconstruction. The relative performance of modulation schemes commonly adopted for filtered-backprojection (FBP) reconstruction were also evaluated for PL in comparison. Methods: We adopt a task-driven imaging framework that utilizes a patient-specific anatomical model and information of the imaging task to optimize imaging performance in terms of detectability index (d'). This framework leverages a theoretical model based on implicit function theorem and Fourier approximations to predict local spatial resolution and noise characteristics of PL reconstruction as a function of the imaging parameters to be optimized. Tube current modulation was parameterized as a linear combination of Gaussian basis functions, and regularization was based on the design of (directional) pairwise penalty weights for the 8 in-plane neighboring voxels. Detectability was optimized using a covariance matrix adaptation evolutionary strategy algorithm. Task-driven designs were compared to conventional tube current modulation strategies for a Gaussian detection task in an abdomen phantom. Results: The task-driven design yielded the best performance, improving d' by ~20% over an unmodulated acquisition. Contrary to FBP, PL reconstruction using automatic exposure control and modulation based on minimum variance (in FBP) performed worse than the unmodulated case, decreasing d' by 16% and 9%, respectively. Conclusions: This work shows that conventional tube current modulation schemes suitable for FBP can be suboptimal for PL reconstruction. Thus, the proposed task-driven optimization provides additional opportunities for improved imaging performance and dose reduction beyond that achievable with conventional acquisition and reconstruction.
Gang, G. J.; Siewerdsen, J. H.; Stayman, J. W.
2016-01-01
Purpose This work applies task-driven optimization to design CT tube current modulation and directional regularization in penalized-likelihood (PL) reconstruction. The relative performance of modulation schemes commonly adopted for filtered-backprojection (FBP) reconstruction were also evaluated for PL in comparison. Methods We adopt a task-driven imaging framework that utilizes a patient-specific anatomical model and information of the imaging task to optimize imaging performance in terms of detectability index (d’). This framework leverages a theoretical model based on implicit function theorem and Fourier approximations to predict local spatial resolution and noise characteristics of PL reconstruction as a function of the imaging parameters to be optimized. Tube current modulation was parameterized as a linear combination of Gaussian basis functions, and regularization was based on the design of (directional) pairwise penalty weights for the 8 in-plane neighboring voxels. Detectability was optimized using a covariance matrix adaptation evolutionary strategy algorithm. Task-driven designs were compared to conventional tube current modulation strategies for a Gaussian detection task in an abdomen phantom. Results The task-driven design yielded the best performance, improving d’ by ~20% over an unmodulated acquisition. Contrary to FBP, PL reconstruction using automatic exposure control and modulation based on minimum variance (in FBP) performed worse than the unmodulated case, decreasing d’ by 16% and 9%, respectively. Conclusions This work shows that conventional tube current modulation schemes suitable for FBP can be suboptimal for PL reconstruction. Thus, the proposed task-driven optimization provides additional opportunities for improved imaging performance and dose reduction beyond that achievable with conventional acquisition and reconstruction. PMID:27110053
1984-10-01
and second Fr~ chet derivatives of 3(v) 1 n L(v) are given by (Tapia, 1971) n d I n(xi) n (1- d )I X I iv MxZ) ’-t)dt -2 < v,rn and n d 2. (X-n j~i (v)i...8217 first APLE; SurvivaZ e.t..a- tion; Random censor-hip; Nonparaet"c density estimation; Reliability. AB STRACT D Based on arbitrarily right-censored...functional 0: H(n) -I R. Given the arbitrarily right-censored sample (xi,dt), i11,2,... ,n, the #-penalized likelihood of v c H(n) is defined by %I n d
Penalized splines for smooth representation of high-dimensional Monte Carlo datasets
NASA Astrophysics Data System (ADS)
Whitehorn, Nathan; van Santen, Jakob; Lafebre, Sven
2013-09-01
Detector response to a high-energy physics process is often estimated by Monte Carlo simulation. For purposes of data analysis, the results of this simulation are typically stored in large multi-dimensional histograms, which can quickly become both too large to easily store and manipulate and numerically problematic due to unfilled bins or interpolation artifacts. We describe here an application of the penalized spline technique (Marx and Eilers, 1996) [1] to efficiently compute B-spline representations of such tables and discuss aspects of the resulting B-spline fits that simplify many common tasks in handling tabulated Monte Carlo data in high-energy physics analysis, in particular their use in maximum-likelihood fitting.
The effect of health and penal harm on aging female prisoners' views of dying in prison.
Deaton, Dayron; Aday, Ronald H; Wahidin, Azrini
With tougher sentencing laws, an increasing number of individuals are finding themselves spending their final years of life in prison. Drawing on a sample of 327 women over the age of 50 incarcerated in five Southern states, the present study investigates the relationship between numerous health variables and the Templer Death Anxiety Scale (TDAS). Qualitatively, the article also provides personal accounts from inmates that serve to reinforce death fears when engaging the prison health care system. Participants reported a mean of 6.40 on the TDAS indicating a substantial degree of death anxiety when compared to community samples. Both mental and physical health measures were important indicators of death anxiety. Qualitative information discovered that respondents' concerns about dying in prison were often influenced by the perceived lack of adequate health care and the indifference of prison staff and other instances of penal harm.
Recognition of caudal regression syndrome.
Boulas, Mari M
2009-04-01
Caudal regression syndrome, also referred to as caudal dysplasia and sacral agenesis syndrome, is a rare congenital malformation characterized by varying degrees of developmental failure early in gestation. It involves the lower extremities, the lumbar and coccygeal vertebrae, and corresponding segments of the spinal cord. This is a rare disorder, and true pathogenesis is unclear. The etiology is thought to be related to maternal diabetes, genetic predisposition, and vascular hypoperfusion, but no true causative factor has been determined. Fetal diagnostic tools allow for early recognition of the syndrome, and careful examination of the newborn is essential to determine the extent of the disorder. Associated organ system dysfunction depends on the severity of the disease. Related defects are structural, and systematic problems including respiratory, cardiac, gastrointestinal, urinary, orthopedic, and neurologic can be present in varying degrees of severity and in different combinations. A multidisciplinary approach to management is crucial. Because the primary pathology is irreversible, treatment is only supportive.
Practical Session: Multiple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Scientific Progress or Regress in Sports Physiology?
Böning, Dieter
2016-11-01
In modern societies there is strong belief in scientific progress, but, unfortunately, a parallel partial regress occurs because of often avoidable mistakes. Mistakes are mainly forgetting, erroneous theories, errors in experiments and manuscripts, prejudice, selected publication of "positive" results, and fraud. An example of forgetting is that methods introduced decades ago are used without knowing the underlying theories: Basic articles are no longer read or cited. This omission may cause incorrect interpretation of results. For instance, false use of actual base excess instead of standard base excess for calculation of the number of hydrogen ions leaving the muscles raised the idea that an unknown fixed acid is produced in addition to lactic acid during exercise. An erroneous theory led to the conclusion that lactate is not the anion of a strong acid but a buffer. Mistakes occur after incorrect application of a method, after exclusion of unwelcome values, during evaluation of measurements by false calculations, or during preparation of manuscripts. Co-authors, as well as reviewers, do not always carefully read papers before publication. Peer reviewers might be biased against a hypothesis or an author. A general problem is selected publication of positive results. An example of fraud in sports medicine is the presence of doped subjects in groups of investigated athletes. To reduce regress, it is important that investigators search both original and recent articles on a topic and conscientiously examine the data. All co-authors and reviewers should read the text thoroughly and inspect all tables and figures in a manuscript.
Semiparametric modeling and analysis of longitudinal method comparison data.
Rathnayake, Lasitha N; Choudhary, Pankaj K
2017-02-19
Studies comparing two or more methods of measuring a continuous variable are routinely conducted in biomedical disciplines with the primary goal of measuring agreement between the methods. Often, the data are collected by following a cohort of subjects over a period of time. This gives rise to longitudinal method comparison data where there is one observation trajectory for each method on every subject. It is not required that observations from all methods be available at each observation time. The multiple trajectories on the same subjects are dependent. We propose modeling the trajectories nonparametrically through penalized regression splines within the framework of mixed-effects models. The model also uses random effects of subjects and their interactions to capture dependence in observations from the same subjects. It additionally allows the within-subject errors of each method to be correlated. It is fit using the method of maximum likelihood. Agreement between the methods is evaluated by performing inference on measures of agreement, such as concordance correlation coefficient and total deviation index, which are functions of parameters of the assumed model. Simulations indicate that the proposed methodology performs reasonably well for 30 or more subjects. Its application is illustrated by analyzing a dataset of percentage body fat measurements. Copyright © 2017 John Wiley & Sons, Ltd.
Reducing geometric dilution of precision using ridge regression signal processing
NASA Astrophysics Data System (ADS)
Kelly, R. J.
The authors propose a method for reducing the effects of GDOP (geometric dilution of precision) in position-fix navigation systems. The idea is to incorporate ridge regression into the aircraft navigation signal processor. MSE (mean square error) performance of an ordinary LMS (least mean square) signal processor was compared with one using ridge regression. Computer simulations confirmed the theory that variance inflation caused by GDOP can be measurably reduced by the ridge regression algorithm. The technique is applicable not only to DME/DME (distance measuring equipment) and GPS but applies also to any position-fix navigation aid, e.g. Loran-C, Omega, and JTIDS relative navigation.
[Unconditioned logistic regression and sample size: a bibliographic review].
Ortega Calvo, Manuel; Cayuela Domínguez, Aurelio
2002-01-01
Unconditioned logistic regression is a highly useful risk prediction method in epidemiology. This article reviews the different solutions provided by different authors concerning the interface between the calculation of the sample size and the use of logistics regression. Based on the knowledge of the information initially provided, a review is made of the customized regression and predictive constriction phenomenon, the design of an ordinal exposition with a binary output, the event of interest per variable concept, the indicator variables, the classic Freeman equation, etc. Some skeptical ideas regarding this subject are also included.
Comparison of regression modeling techniques for resource estimation
NASA Technical Reports Server (NTRS)
Card, D. N.
1983-01-01
The development and validation of resource utilization models is an active area of software engineering research. Regression analysis is the principal tool employed in these studies. However, little attention was given to determining which of the various regression methods available is the most appropriate. The objective of the study is to compare three alternative regession procedures by examining the results of their application to one commonly accepted equations for resource estimation. The data studied was summarized, the resource estimation equation was described, the regression procedures were explained, and the results obtained from the proceures were compared.
Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration
NASA Astrophysics Data System (ADS)
Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman
In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Genetics Home Reference: caudal regression syndrome
... of a genetic condition? Genetic and Rare Diseases Information Center Frequency Caudal regression syndrome is estimated to occur in 1 to ... parts of the skeleton, gastrointestinal system, and genitourinary ... caudal regression syndrome results from the presence of an abnormal ...
Mapping geogenic radon potential by regression kriging.
Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos
2016-02-15
Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly.
Monthly streamflow forecasting using Gaussian Process Regression
NASA Astrophysics Data System (ADS)
Sun, Alexander Y.; Wang, Dingbao; Xu, Xianli
2014-04-01
Streamflow forecasting plays a critical role in nearly all aspects of water resources planning and management. In this work, Gaussian Process Regression (GPR), an effective kernel-based machine learning algorithm, is applied to probabilistic streamflow forecasting. GPR is built on Gaussian process, which is a stochastic process that generalizes multivariate Gaussian distribution to infinite-dimensional space such that distributions over function values can be defined. The GPR algorithm provides a tractable and flexible hierarchical Bayesian framework for inferring the posterior distribution of streamflows. The prediction skill of the algorithm is tested for one-month-ahead prediction using the MOPEX database, which includes long-term hydrometeorological time series collected from 438 basins across the U.S. from 1948 to 2003. Comparisons with linear regression and artificial neural network models indicate that GPR outperforms both regression methods in most cases. The GPR prediction of MOPEX basins is further examined using the Budyko framework, which helps to reveal the close relationships among water-energy partitions, hydrologic similarity, and predictability. Flow regime modification and the resulting loss of predictability have been a major concern in recent years because of climate change and anthropogenic activities. The persistence of streamflow predictability is thus examined by extending the original MOPEX data records to 2012. Results indicate relatively strong persistence of streamflow predictability in the extended period, although the low-predictability basins tend to show more variations. Because many low-predictability basins are located in regions experiencing fast growth of human activities, the significance of sustainable development and water resources management can be even greater for those regions.
Multiple linear regression for isotopic measurements
NASA Astrophysics Data System (ADS)
Garcia Alonso, J. I.
2012-04-01
There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.
Reliable prediction intervals with regression neural networks.
Papadopoulos, Harris; Haralambous, Haris
2011-10-01
This paper proposes an extension to conventional regression neural networks (NNs) for replacing the point predictions they produce with prediction intervals that satisfy a required level of confidence. Our approach follows a novel machine learning framework, called Conformal Prediction (CP), for assigning reliable confidence measures to predictions without assuming anything more than that the data are independent and identically distributed (i.i.d.). We evaluate the proposed method on four benchmark datasets and on the problem of predicting Total Electron Content (TEC), which is an important parameter in trans-ionospheric links; for the latter we use a dataset of more than 60000 TEC measurements collected over a period of 11 years. Our experimental results show that the prediction intervals produced by our method are both well calibrated and tight enough to be useful in practice.
Tolerance bounds for log gamma regression models
NASA Technical Reports Server (NTRS)
Jones, R. A.; Scholz, F. W.; Ossiander, M.; Shorack, G. R.
1985-01-01
The present procedure for finding lower confidence bounds for the quantiles of Weibull populations, on the basis of the solution of a quadratic equation, is more accurate than current Monte Carlo tables and extends to any location-scale family. It is shown that this method is accurate for all members of the log gamma(K) family, where K = 1/2 to infinity, and works well for censored data, while also extending to regression data. An even more accurate procedure involving an approximation to the Lawless (1982) conditional procedure, with numerical integrations whose tables are independent of the data, is also presented. These methods are applied to the case of failure strengths of ceramic specimens from each of three billets of Si3N4, which have undergone flexural strength testing.
Multiple regression analyses in clinical child and adolescent psychology.
Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida
2006-09-01
A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation, analysis of mediation, assumption violations, outliers, limited dependent variables, and directed regression and its relation to structural equation modeling. Analytic guidelines are provided within each domain.