DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate ofmore » $$O(n^{-1/2})$$, the corresponding IRUQ converges at $$O(n^{-1})$$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.« less
Yu, Wenbao; Park, Taesung
2014-01-01
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
Bayesian Analysis of High Dimensional Classification
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Subhadeep; Liang, Faming
2009-12-01
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; ...
2017-10-10
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
NASA Astrophysics Data System (ADS)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.
GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
Zheng, Qi; Peng, Limin; He, Xuming
2015-01-01
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.
Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace
ERIC Educational Resources Information Center
Culpepper, Steven Andrew; Park, Trevor
2017-01-01
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca
2016-10-15
The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Binder, Harald; Porzelius, Christine; Schumacher, Martin
2011-03-01
Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Spertus, Jacob V; Normand, Sharon-Lise T
2018-04-23
High-dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high-dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high-dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian
2015-01-01
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension
Wang, Lan; Wu, Yichao
2012-01-01
Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036
High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
Daye, Z. John; Chen, Jinbo; Li, Hongzhe
2011-01-01
Summary We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.
Liu, Han; Wang, Lie; Zhao, Tuo
2015-08-01
We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.
Ke, Tracy; Fan, Jianqing; Wu, Yichao
2014-01-01
This paper explores the homogeneity of coefficients in high-dimensional regression, which extends the sparsity concept and is more general and suitable for many applications. Homogeneity arises when regression coefficients corresponding to neighboring geographical regions or a similar cluster of covariates are expected to be approximately the same. Sparsity corresponds to a special case of homogeneity with a large cluster of known atom zero. In this article, we propose a new method called clustering algorithm in regression via data-driven segmentation (CARDS) to explore homogeneity. New mathematics are provided on the gain that can be achieved by exploring homogeneity. Statistical properties of two versions of CARDS are analyzed. In particular, the asymptotic normality of our proposed CARDS estimator is established, which reveals better estimation accuracy for homogeneous parameters than that without homogeneity exploration. When our methods are combined with sparsity exploration, further efficiency can be achieved beyond the exploration of sparsity alone. This provides additional insights into the power of exploring low-dimensional structures in high-dimensional regression: homogeneity and sparsity. Our results also shed lights on the properties of the fussed Lasso. The newly developed method is further illustrated by simulation studies and applications to real data. Supplementary materials for this article are available online. PMID:26085701
Reduced rank regression via adaptive nuclear norm penalization
Chen, Kun; Dong, Hongbo; Chan, Kung-Sik
2014-01-01
Summary We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy. PMID:25045172
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Wang, Lan; Kim, Yongdai; Li, Runze
2013-10-01
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION
Wang, Lan; Kim, Yongdai; Li, Runze
2014-01-01
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis. PMID:24948843
Testing a single regression coefficient in high dimensional linear models
Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2017-01-01
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668
Testing a single regression coefficient in high dimensional linear models.
Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2016-11-01
In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.
Hyper-Spectral Image Analysis With Partially Latent Regression and Spatial Markov Dependencies
NASA Astrophysics Data System (ADS)
Deleforge, Antoine; Forbes, Florence; Ba, Sileye; Horaud, Radu
2015-09-01
Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-spectral observations. Within this framework, we propose a spatially-constrained and partially-latent regression method which maps high-dimensional inputs (hyper-spectral images) onto low-dimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locally-linear mappings (GLLiM) with a partially-latent response model. While the former makes high-dimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussian-mixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.
Prognostic value of three-dimensional ultrasound for fetal hydronephrosis
WANG, JUNMEI; YING, WEIWEN; TANG, DAXING; YANG, LIMING; LIU, DONGSHENG; LIU, YUANHUI; PAN, JIAOE; XIE, XING
2015-01-01
The present study evaluated the prognostic value of three-dimensional ultrasound for fetal hydronephrosis. Pregnant females with fetal hydronephrosis were enrolled and a novel three-dimensional ultrasound indicator, renal parenchymal volume/kidney volume, was introduced to predict the postnatal prognosis of fetal hydronephrosis in comparison with commonly used ultrasound indicators. All ultrasound indicators of fetal hydronephrosis could predict whether postnatal surgery was required for fetal hydronephrosis; however, the predictive performance of renal parenchymal volume/kidney volume measurements as an individual indicator was the highest. In conclusion, ultrasound is important in predicting whether postnatal surgery is required for fetal hydronephrosis, and the three-dimensional ultrasound indicator renal parenchymal volume/kidney volume has a high predictive performance. Furthermore, the majority of cases of fetal hydronephrosis spontaneously regress subsequent to birth, and the regression time is closely associated with ultrasound indicators. PMID:25667626
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Tao, Chenyang; Nichols, Thomas E.; Hua, Xue; Ching, Christopher R.K.; Rolls, Edmund T.; Thompson, Paul M.; Feng, Jianfeng
2017-01-01
We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches. PMID:27666385
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.
Gong, Xiajing; Hu, Meng; Zhao, Liang
2018-05-01
Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
Fuzzy Regression Prediction and Application Based on Multi-Dimensional Factors of Freight Volume
NASA Astrophysics Data System (ADS)
Xiao, Mengting; Li, Cheng
2018-01-01
Based on the reality of the development of air cargo, the multi-dimensional fuzzy regression method is used to determine the influencing factors, and the three most important influencing factors of GDP, total fixed assets investment and regular flight route mileage are determined. The system’s viewpoints and analogy methods, the use of fuzzy numbers and multiple regression methods to predict the civil aviation cargo volume. In comparison with the 13th Five-Year Plan for China’s Civil Aviation Development (2016-2020), it is proved that this method can effectively improve the accuracy of forecasting and reduce the risk of forecasting. It is proved that this model predicts civil aviation freight volume of the feasibility, has a high practical significance and practical operation.
Starck, Tuomo; Nikkinen, Juha; Rahko, Jukka; Remes, Jukka; Hurtig, Tuula; Haapsamo, Helena; Jussila, Katja; Kuusikko-Gauffin, Sanna; Mattila, Marja-Leena; Jansson-Verkasalo, Eira; Pauls, David L; Ebeling, Hanna; Moilanen, Irma; Tervonen, Osmo; Kiviniemi, Vesa J
2013-01-01
In resting state functional magnetic resonance imaging (fMRI) studies of autism spectrum disorders (ASDs) decreased frontal-posterior functional connectivity is a persistent finding. However, the picture of the default mode network (DMN) hypoconnectivity remains incomplete. In addition, the functional connectivity analyses have been shown to be susceptible even to subtle motion. DMN hypoconnectivity in ASD has been specifically called for re-evaluation with stringent motion correction, which we aimed to conduct by so-called scrubbing. A rich set of default mode subnetworks can be obtained with high dimensional group independent component analysis (ICA) which can potentially provide more detailed view of the connectivity alterations. We compared the DMN connectivity in high-functioning adolescents with ASDs to typically developing controls using ICA dual-regression with decompositions from typical to high dimensionality. Dual-regression analysis within DMN subnetworks did not reveal alterations but connectivity between anterior and posterior DMN subnetworks was decreased in ASD. The results were very similar with and without motion scrubbing thus indicating the efficacy of the conventional motion correction methods combined with ICA dual-regression. Specific dissociation between DMN subnetworks was revealed on high ICA dimensionality, where networks centered at the medial prefrontal cortex and retrosplenial cortex showed weakened coupling in adolescents with ASDs compared to typically developing control participants. Generally the results speak for disruption in the anterior-posterior DMN interplay on the network level whereas local functional connectivity in DMN seems relatively unaltered.
Multivariate time series analysis of neuroscience data: some challenges and opportunities.
Pourahmadi, Mohsen; Noorbaloochi, Siamak
2016-04-01
Neuroimaging data may be viewed as high-dimensional multivariate time series, and analyzed using techniques from regression analysis, time series analysis and spatiotemporal analysis. We discuss issues related to data quality, model specification, estimation, interpretation, dimensionality and causality. Some recent research areas addressing aspects of some recurring challenges are introduced. Copyright © 2015 Elsevier Ltd. All rights reserved.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen
2018-01-25
Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION
Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong
2015-01-01
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Lin, Wei; Feng, Rui; Li, Hongzhe
2014-01-01
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
Sufficient Forecasting Using Factor Models
Fan, Jianqing; Xue, Lingzhou; Yao, Jiawei
2017-01-01
We consider forecasting a single time series when there is a large number of predictors and a possible nonlinear effect. The dimensionality was first reduced via a high-dimensional (approximate) factor model implemented by the principal component analysis. Using the extracted factors, we develop a novel forecasting method called the sufficient forecasting, which provides a set of sufficient predictive indices, inferred from high-dimensional predictors, to deliver additional predictive power. The projected principal component analysis will be employed to enhance the accuracy of inferred factors when a semi-parametric (approximate) factor model is assumed. Our method is also applicable to cross-sectional sufficient regression using extracted factors. The connection between the sufficient forecasting and the deep learning architecture is explicitly stated. The sufficient forecasting correctly estimates projection indices of the underlying factors even in the presence of a nonparametric forecasting function. The proposed method extends the sufficient dimension reduction to high-dimensional regimes by condensing the cross-sectional information through factor models. We derive asymptotic properties for the estimate of the central subspace spanned by these projection directions as well as the estimates of the sufficient predictive indices. We further show that the natural method of running multiple regression of target on estimated factors yields a linear estimate that actually falls into this central subspace. Our method and theory allow the number of predictors to be larger than the number of observations. We finally demonstrate that the sufficient forecasting improves upon the linear forecasting in both simulation studies and an empirical study of forecasting macroeconomic variables. PMID:29731537
Gentry, Amanda Elswick; Jackson-Cook, Colleen K; Lyon, Debra E; Archer, Kellie J
2015-01-01
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment.
D'Archivio, Angelo Antonio; Incani, Angela; Ruggieri, Fabrizio
2011-01-01
In this paper, we use a quantitative structure-retention relationship (QSRR) method to predict the retention times of polychlorinated biphenyls (PCBs) in comprehensive two-dimensional gas chromatography (GC×GC). We analyse the GC×GC retention data taken from the literature by comparing predictive capability of different regression methods. The various models are generated using 70 out of 209 PCB congeners in the calibration stage, while their predictive performance is evaluated on the remaining 139 compounds. The two-dimensional chromatogram is initially estimated by separately modelling retention times of PCBs in the first and in the second column ((1) t (R) and (2) t (R), respectively). In particular, multilinear regression (MLR) combined with genetic algorithm (GA) variable selection is performed to extract two small subsets of predictors for (1) t (R) and (2) t (R) from a large set of theoretical molecular descriptors provided by the popular software Dragon, which after removal of highly correlated or almost constant variables consists of 237 structure-related quantities. Based on GA-MLR analysis, a four-dimensional and a five-dimensional relationship modelling (1) t (R) and (2) t (R), respectively, are identified. Single-response partial least square (PLS-1) regression is alternatively applied to independently model (1) t (R) and (2) t (R) without the need for preliminary GA variable selection. Further, we explore the possibility of predicting the two-dimensional chromatogram of PCBs in a single calibration procedure by using a two-response PLS (PLS-2) model or a feed-forward artificial neural network (ANN) with two output neurons. In the first case, regression is carried out on the full set of 237 descriptors, while the variables previously selected by GA-MLR are initially considered as ANN inputs and subjected to a sensitivity analysis to remove the redundant ones. Results show PLS-1 regression exhibits a noticeably better descriptive and predictive performance than the other investigated approaches. The observed values of determination coefficients for (1) t (R) and (2) t (R) in calibration (0.9999 and 0.9993, respectively) and prediction (0.9987 and 0.9793, respectively) provided by PLS-1 demonstrate that GC×GC behaviour of PCBs is properly modelled. In particular, the predicted two-dimensional GC×GC chromatogram of 139 PCBs not involved in the calibration stage closely resembles the experimental one. Based on the above lines of evidence, the proposed approach ensures accurate simulation of the whole GC×GC chromatogram of PCBs using experimental determination of only 1/3 retention data of representative congeners.
Surányi, A; Kozinszky, Z; Molnár, A; Nyári, T; Bitó, T; Pál, A
2013-10-01
The aim of our study was to evaluate placental three-dimensional power Doppler indices in diabetic pregnancies in the second and third trimesters and to compare them with those of the normal controls. Placental vascularization of pregnant women was determined by three-dimensional power Doppler ultrasound technique. The calculated indices included vascularization index (VI), flow index (FI), and vascularization flow index (VFI). Uncomplicated pregnancies (n = 113) were compared with pregnancies complicated by gestational diabetes mellitus (n = 56) and diabetes mellitus (n = 43). The three-dimensional power Doppler indices were not significantly different between the two diabetic subgroups. All the indices in diabetic patients were significantly reduced compared with those in non-diabetic individuals (p < 0.001). Placental three-dimensional power Doppler indices are slightly diminished throughout diabetic pregnancy [regression coefficients: -0.23 (FI), -0.06 (VI), and -0.04 (VFI)] and normal pregnancy [regression coefficients: -0.13 (FI), -0.20 (VI), and -0.11 (VFI)]. The uteroplacental circulation (umbilical and uterine artery) was not correlated significantly to the three-dimensional power Doppler indices. If all placental indices are low during late pregnancy, then the odds of the diabetes are significantly high (adjusted odds ratio: 1.10). A decreased placental vascularization could be an adjunct sonographic marker in the diagnosis of diabetic pregnancy in mid-gestation and late gestation. © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised
USDA-ARS?s Scientific Manuscript database
In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Effect of Contact Damage on the Strength of Ceramic Materials.
1982-10-01
variables that are important to erosion, and a multivariate , linear regression analysis is used to fit the data to the dimensional analysis. The...of Equations 7 and 8 by a multivariable regression analysis (room tem- perature data) Exponent Regression Standard error Computed coefficient of...1980) 593. WEAVER, Proc. Brit. Ceram. Soc. 22 (1973) 125. 39. P. W. BRIDGMAN, "Dimensional Analaysis ", (Yale 18. R. W. RICE, S. W. FREIMAN and P. F
Exploration of graphene oxide as an intelligent platform for cancer vaccines
NASA Astrophysics Data System (ADS)
Yue, Hua; Wei, Wei; Gu, Zonglin; Ni, Dezhi; Luo, Nana; Yang, Zaixing; Zhao, Lin; Garate, Jose Antonio; Zhou, Ruhong; Su, Zhiguo; Ma, Guanghui
2015-11-01
We explored an intelligent vaccine system via facile approaches using both experimental and theoretical techniques based on the two-dimensional graphene oxide (GO). Without extra addition of bio/chemical stimulators, the microsized GO imparted various immune activation tactics to improve the antigen immunogenicity. A high antigen adsorption was acquired, and the mechanism was revealed to be a combination of electrostatic, hydrophobic, and π-π stacking interactions. The ``folding GO'' acted as a cytokine self-producer and antigen reservoir and showed a particular autophagy, which efficiently promoted the activation of antigen presenting cells (APCs) and subsequent antigen cross-presentation. Such a ``One but All'' modality thus induced a high level of anti-tumor responses in a programmable way and resulted in efficient tumor regression in vivo. This work may shed light on the potential use of a new dimensional nano-platform in the development of high-performance cancer vaccines.We explored an intelligent vaccine system via facile approaches using both experimental and theoretical techniques based on the two-dimensional graphene oxide (GO). Without extra addition of bio/chemical stimulators, the microsized GO imparted various immune activation tactics to improve the antigen immunogenicity. A high antigen adsorption was acquired, and the mechanism was revealed to be a combination of electrostatic, hydrophobic, and π-π stacking interactions. The ``folding GO'' acted as a cytokine self-producer and antigen reservoir and showed a particular autophagy, which efficiently promoted the activation of antigen presenting cells (APCs) and subsequent antigen cross-presentation. Such a ``One but All'' modality thus induced a high level of anti-tumor responses in a programmable way and resulted in efficient tumor regression in vivo. This work may shed light on the potential use of a new dimensional nano-platform in the development of high-performance cancer vaccines. Electronic supplementary information (ESI) available. See DOI: 10.1039/c5nr04986e
NASA Astrophysics Data System (ADS)
Takayama, T.; Iwasaki, A.
2016-06-01
Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous large-area forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE) of 66.16 t/ha in the cross-validation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.
Ronot, Maxime; Lambert, Simon A.; Wagner, Mathilde; Garteiser, Philippe; Doblas, Sabrina; Albuquerque, Miguel; Paradis, Valérie; Vilgrain, Valérie; Sinkus, Ralph; Van Beers, Bernard E.
2014-01-01
Objective To assess in a high-resolution model of thin liver rat slices which viscoelastic parameter at three-dimensional multifrequency MR elastography has the best diagnostic performance for quantifying liver fibrosis. Materials and Methods The study was approved by the ethics committee for animal care of our institution. Eight normal rats and 42 rats with carbon tetrachloride induced liver fibrosis were used in the study. The rats were sacrificed, their livers were resected and three-dimensional MR elastography of 5±2 mm liver slices was performed at 7T with mechanical frequencies of 500, 600 and 700 Hz. The complex shear, storage and loss moduli, and the coefficient of the frequency power law were calculated. At histopathology, fibrosis and inflammation were assessed with METAVIR score, fibrosis was further quantified with morphometry. The diagnostic value of the viscoelastic parameters for assessing fibrosis severity was evaluated with simple and multiple linear regressions, receiver operating characteristic analysis and Obuchowski measures. Results At simple regression, the shear, storage and loss moduli were associated with the severity of fibrosis. At multiple regression, the storage modulus at 600 Hz was the only parameter associated with fibrosis severity (r = 0.86, p<0.0001). This parameter had an Obuchowski measure of 0.89+/−0.03. This measure was significantly larger than that of the loss modulus (0.78+/−0.04, p = 0.028), but not than that of the complex shear modulus (0.88+/−0.03, p = 0.84). Conclusion Our high resolution, three-dimensional multifrequency MR elastography study of thin liver slices shows that the storage modulus is the viscoelastic parameter that has the best association with the severity of liver fibrosis. However, its diagnostic performance does not differ significantly from that of the complex shear modulus. PMID:24722733
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Robust learning for optimal treatment decision with NP-dimensionality
Shi, Chengchun; Song, Rui; Lu, Wenbin
2016-01-01
In order to identify important variables that are involved in making optimal treatment decision, Lu, Zhang and Zeng (2013) proposed a penalized least squared regression framework for a fixed number of predictors, which is robust against the misspecification of the conditional mean model. Two problems arise: (i) in a world of explosively big data, effective methods are needed to handle ultra-high dimensional data set, for example, with the dimension of predictors is of the non-polynomial (NP) order of the sample size; (ii) both the propensity score and conditional mean models need to be estimated from data under NP dimensionality. In this paper, we propose a robust procedure for estimating the optimal treatment regime under NP dimensionality. In both steps, penalized regressions are employed with the non-concave penalty function, where the conditional mean model of the response given predictors may be misspecified. The asymptotic properties, such as weak oracle properties, selection consistency and oracle distributions, of the proposed estimators are investigated. In addition, we study the limiting distribution of the estimated value function for the obtained optimal treatment regime. The empirical performance of the proposed estimation method is evaluated by simulations and an application to a depression dataset from the STAR*D study. PMID:28781717
Azevedo, C F; Nascimento, M; Silva, F F; Resende, M D V; Lopes, P S; Guimarães, S E F; Glória, L S
2015-10-09
A significant contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. With this approach, genome-wide selection (GWS) can be used for this purpose. GWS consists of analyzing a large number of single nucleotide polymorphism markers widely distributed in the genome; however, because the number of markers is much larger than the number of genotyped individuals, and such markers are highly correlated, special statistical methods are widely required. Among these methods, independent component regression, principal component regression, partial least squares, and partial principal components stand out. Thus, the aim of this study was to propose an application of the methods of dimensionality reduction to GWS of carcass traits in an F2 (Piau x commercial line) pig population. The results show similarities between the principal and the independent component methods and provided the most accurate genomic breeding estimates for most carcass traits in pigs.
Wolters, Mark A; Dean, C B
2017-01-01
Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.
Lien, Tonje G; Borgan, Ørnulf; Reppe, Sjur; Gautvik, Kaare; Glad, Ingrid Kristine
2018-03-07
Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R 2 =0.726, comparable to an average R 2 =0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.
Biosignature Discovery for Substance Use Disorders Using Statistical Learning.
Baurley, James W; McMahan, Christopher S; Ervin, Carolyn M; Pardamean, Bens; Bergen, Andrew W
2018-02-01
There are limited biomarkers for substance use disorders (SUDs). Traditional statistical approaches are identifying simple biomarkers in large samples, but clinical use cases are still being established. High-throughput clinical, imaging, and 'omic' technologies are generating data from SUD studies and may lead to more sophisticated and clinically useful models. However, analytic strategies suited for high-dimensional data are not regularly used. We review strategies for identifying biomarkers and biosignatures from high-dimensional data types. Focusing on penalized regression and Bayesian approaches, we address how to leverage evidence from existing studies and knowledge bases, using nicotine metabolism as an example. We posit that big data and machine learning approaches will considerably advance SUD biomarker discovery. However, translation to clinical practice, will require integrated scientific efforts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Spectral Regression Discriminant Analysis for Hyperspectral Image Classification
NASA Astrophysics Data System (ADS)
Pan, Y.; Wu, J.; Huang, H.; Liu, J.
2012-08-01
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
2016-11-22
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
Zhou, Hua; Li, Lexin
2014-01-01
Summary Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry and electroencephalography, matrix-type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge on the sparsity of the true signal in terms of the number of its non-zero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. We propose a class of regularized matrix regression methods based on spectral regularization. A highly efficient and scalable estimation algorithm is developed, and a degrees-of-freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the method proposed is demonstrated on both synthetic and real examples. PMID:24648830
Logistic regression trees for initial selection of interesting loci in case-control studies
Nickolov, Radoslav Z; Milanov, Valentin B
2007-01-01
Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
Zhao, Lue Ping; Bolouri, Hamid
2016-04-01
Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015). Copyright © 2016 Elsevier Inc. All rights reserved.
Zhao, Lue Ping; Bolouri, Hamid
2016-01-01
Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and to make the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient’s similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient’s HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (p=0.015). PMID:26972839
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
NASA Astrophysics Data System (ADS)
Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph
2017-10-01
In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.
Xuan Chi; Barry Goodwin
2012-01-01
Spatial and temporal relationships among agricultural prices have been an important topic of applied research for many years. Such research is used to investigate the performance of markets and to examine linkages up and down the marketing chain. This research has empirically evaluated price linkages by using correlation and regression models and, later, linear and...
Burning rate for steel-cased, pressed binderless HMX
NASA Technical Reports Server (NTRS)
Fifer, R. A.; Cole, J. E.
1980-01-01
The burning behavior of pressed binderless HMX laterally confined in 6.4 mm i.d. steel cases was measured over the pressure range 1.45 to 338 MPa in a constant pressure strand burner. The measured regression rates are compared to those reported previously for unconfined samples. It is shown that lateral confinement results in a several-fold decrease in the regression rate for the coarse particle size HMX above the transition to super fast regression. For class E samples, confinement shifts the transition to super fast regression from low pressure to high pressure. These results are interpreted in terms of the previously proposed progressive deconsolidation mechanism. Preliminary holographic photography and closed bomb tests are also described. Theoretical one dimensional modeling calculations were carried out to predict the expected flame height (particle burn out distance) as a function of particle size and pressure for binderless HMX burning by a progressive deconsolidation mechanism.
Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Jiang, Dingfeng; Huang, Jian
2013-01-01
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. PMID:25309048
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
Onset patterns in autism: Variation across informants, methods, and timing.
Ozonoff, Sally; Gangi, Devon; Hanzel, Elise P; Hill, Alesha; Hill, Monique M; Miller, Meghan; Schwichtenberg, A J; Steinfeld, Mary Beth; Parikh, Chandni; Iosif, Ana-Maria
2018-05-01
While previous studies suggested that regressive forms of onset were not common in autism spectrum disorder (ASD), more recent investigations suggest that the rates are quite high and may be under-reported using certain methods. The current study undertook a systematic investigation of how rates of regression differed by measurement method. Infants with (n = 147) and without a family history of ASD (n = 83) were seen prospectively for up to 7 visits in the first three years of life. Reports of symptom onset were collected using four measures that systematically varied the informant (examiner vs. parent), the decision type (categorical [regression absent or present] vs. dimensional [frequency of social behaviors]), and the timing of the assessment (retrospective vs. prospective). Latent class growth models were used to classify individual trajectories to see whether regressive onset patterns were infrequent or widespread within the ASD group. A majority of the sample was classified as having a regressive onset using either examiner (88%) or parent (69%) prospective dimensional ratings. Rates of regression were much lower using retrospective or categorical measures (from 29 to 47%). Agreement among different measurement methods was low. Declining trajectories of development, consistent with a regressive onset pattern, are common in children with ASD and may be more the rule than the exception. The accuracy of widely used methods of measuring onset is questionable and the present findings argue against their widespread use. Autism Res 2018, 11: 788-797. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This study examines different ways of measuring the onset of symptoms in autism spectrum disorder (ASD). The present findings suggest that declining developmental skills, consistent with a regressive onset pattern, are common in children with ASD and may be more the rule than the exception. The results question the accuracy of widely used methods of measuring symptom onset and argue against their widespread use. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Enhancement of partial robust M-regression (PRM) performance using Bisquare weight function
NASA Astrophysics Data System (ADS)
Mohamad, Mazni; Ramli, Norazan Mohamed; Ghani@Mamat, Nor Azura Md; Ahmad, Sanizah
2014-09-01
Partial Least Squares (PLS) regression is a popular regression technique for handling multicollinearity in low and high dimensional data which fits a linear relationship between sets of explanatory and response variables. Several robust PLS methods are proposed to accommodate the classical PLS algorithms which are easily affected with the presence of outliers. The recent one was called partial robust M-regression (PRM). Unfortunately, the use of monotonous weighting function in the PRM algorithm fails to assign appropriate and proper weights to large outliers according to their severity. Thus, in this paper, a modified partial robust M-regression is introduced to enhance the performance of the original PRM. A re-descending weight function, known as Bisquare weight function is recommended to replace the fair function in the PRM. A simulation study is done to assess the performance of the modified PRM and its efficiency is also tested in both contaminated and uncontaminated simulated data under various percentages of outliers, sample sizes and number of predictors.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
NASA Astrophysics Data System (ADS)
Pham, Tien-Lam; Nguyen, Nguyen-Duong; Nguyen, Van-Doan; Kino, Hiori; Miyake, Takashi; Dam, Hieu-Chi
2018-05-01
We have developed a descriptor named Orbital Field Matrix (OFM) for representing material structures in datasets of multi-element materials. The descriptor is based on the information regarding atomic valence shell electrons and their coordination. In this work, we develop an extension of OFM called OFM1. We have shown that these descriptors are highly applicable in predicting the physical properties of materials and in providing insights on the materials space by mapping into a low embedded dimensional space. Our experiments with transition metal/lanthanide metal alloys show that the local magnetic moments and formation energies can be accurately reproduced using simple nearest-neighbor regression, thus confirming the relevance of our descriptors. Using kernel ridge regressions, we could accurately reproduce formation energies and local magnetic moments calculated based on first-principles, with mean absolute errors of 0.03 μB and 0.10 eV/atom, respectively. We show that meaningful low-dimensional representations can be extracted from the original descriptor using descriptive learning algorithms. Intuitive prehension on the materials space, qualitative evaluation on the similarities in local structures or crystalline materials, and inference in the designing of new materials by element substitution can be performed effectively based on these low-dimensional representations.
Sun, Hokeun; Wang, Shuang
2013-05-30
The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.
Dickerson, Jane A.; Dovichi, Norman J.
2011-01-01
We perform two-dimensional capillary electrophoresis on fluorescently labeled proteins and peptides. Capillary sieving electrophoresis was performed in the first dimension and micellar electrokinetic capillary chromatography was performed in the second. A cellular homogenate was labeled with the fluorogenic reagent FQ and separated using the system. This homogenate generated a pair of ridges; the first had essentially constant migration time in the CSE dimension, while the second had essentially constant migration time in the MEKC dimension. In addition a few spots were scattered through the electropherogram. The same homogenate was digested using trypsin, and then labeled and subjected to the two dimensional separation. In this case, the two ridges observed from the original two-dimensional separation disappeared, and were replaced by a set of spots that fell along the diagonal. Those spots were identified using a local-maximum algorithm and each was fit using a two-dimensional Gaussian surface by an unsupervised nonlinear least squares regression algorithm. The migration times of the tryptic digest components were highly correlated (r = 0.862). When the slowest migrating components were eliminated from the analysis, the correlation coefficient improved to r = 0.956. PMID:20564272
Senn, Stephen; Graf, Erika; Caputo, Angelika
2007-12-30
Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
MicroCT angiography detects vascular formation and regression in skin wound healing
Urao, Norifumi; Okonkwo, Uzoagu A.; Fang, Milie M.; Zhuang, Zhen W.; Koh, Timothy J.; DiPietro, Luisa A.
2016-01-01
Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to day 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5 μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. PMID:27009591
Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.
Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack
2017-06-01
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Floating Data and the Problem with Illustrating Multiple Regression.
ERIC Educational Resources Information Center
Sachau, Daniel A.
2000-01-01
Discusses how to introduce basic concepts of multiple regression by creating a large-scale, three-dimensional regression model using the classroom walls and floor. Addresses teaching points that should be covered and reveals student reaction to the model. Finds that the greatest benefit of the model is the low fear, walk-through, nonmathematical…
A novel multi-target regression framework for time-series prediction of drug efficacy.
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-18
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.
A novel multi-target regression framework for time-series prediction of drug efficacy
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-01
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions.
Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios
2017-03-01
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
Nonparametric regression applied to quantitative structure-activity relationships
Constans; Hirst
2000-03-01
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
Sinclair, Jonathan; Fewtrell, David; Taylor, Paul John; Bottoms, Lindsay; Atkins, Stephen; Hobbs, Sarah Jane
2014-01-01
Achieving a high ball velocity is important during soccer shooting, as it gives the goalkeeper less time to react, thus improving a player's chance of scoring. This study aimed to identify important technical aspects of kicking linked to the generation of ball velocity using regression analyses. Maximal instep kicks were obtained from 22 academy-level soccer players using a 10-camera motion capture system sampling at 500 Hz. Three-dimensional kinematics of the lower extremity segments were obtained. Regression analysis was used to identify the kinematic parameters associated with the development of ball velocity. A single biomechanical parameter; knee extension velocity of the kicking limb at ball contact Adjusted R(2) = 0.39, p ≤ 0.01 was obtained as a significant predictor of ball-velocity. This study suggests that sagittal plane knee extension velocity is the strongest contributor to ball velocity and potentially overall kicking performance. It is conceivable therefore that players may benefit from exposure to coaching and strength techniques geared towards the improvement of knee extension angular velocity as highlighted in this study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Lin, Guang
In decision-making for groundwater management and contamination remediation, it is important to accurately evaluate the probability of the occurrence of a failure event. For small failure probability analysis, a large number of model evaluations are needed in the Monte Carlo (MC) simulation, which is impractical for CPU-demanding models. One approach to alleviate the computational cost caused by the model evaluations is to construct a computationally inexpensive surrogate model instead. However, using a surrogate approximation can cause an extra error in the failure probability analysis. Moreover, constructing accurate surrogates is challenging for high-dimensional models, i.e., models containing many uncertain input parameters.more » To address these issues, we propose an efficient two-stage MC approach for small failure probability analysis in high-dimensional groundwater contaminant transport modeling. In the first stage, a low-dimensional representation of the original high-dimensional model is sought with Karhunen–Loève expansion and sliced inverse regression jointly, which allows for the easy construction of a surrogate with polynomial chaos expansion. Then a surrogate-based MC simulation is implemented. In the second stage, the small number of samples that are close to the failure boundary are re-evaluated with the original model, which corrects the bias introduced by the surrogate approximation. The proposed approach is tested with a numerical case study and is shown to be 100 times faster than the traditional MC approach in achieving the same level of estimation accuracy.« less
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?
Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina
2016-01-01
High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
Adaptive kernel regression for freehand 3D ultrasound reconstruction
NASA Astrophysics Data System (ADS)
Alshalalfah, Abdel-Latif; Daoud, Mohammad I.; Al-Najar, Mahasen
2017-03-01
Freehand three-dimensional (3D) ultrasound imaging enables low-cost and flexible 3D scanning of arbitrary-shaped organs, where the operator can freely move a two-dimensional (2D) ultrasound probe to acquire a sequence of tracked cross-sectional images of the anatomy. Often, the acquired 2D ultrasound images are irregularly and sparsely distributed in the 3D space. Several 3D reconstruction algorithms have been proposed to synthesize 3D ultrasound volumes based on the acquired 2D images. A challenging task during the reconstruction process is to preserve the texture patterns in the synthesized volume and ensure that all gaps in the volume are correctly filled. This paper presents an adaptive kernel regression algorithm that can effectively reconstruct high-quality freehand 3D ultrasound volumes. The algorithm employs a kernel regression model that enables nonparametric interpolation of the voxel gray-level values. The kernel size of the regression model is adaptively adjusted based on the characteristics of the voxel that is being interpolated. In particular, when the algorithm is employed to interpolate a voxel located in a region with dense ultrasound data samples, the size of the kernel is reduced to preserve the texture patterns. On the other hand, the size of the kernel is increased in areas that include large gaps to enable effective gap filling. The performance of the proposed algorithm was compared with seven previous interpolation approaches by synthesizing freehand 3D ultrasound volumes of a benign breast tumor. The experimental results show that the proposed algorithm outperforms the other interpolation approaches.
Mathew, Boby; Léon, Jens; Sannemann, Wiebke; Sillanpää, Mikko J.
2018-01-01
Gene-by-gene interactions, also known as epistasis, regulate many complex traits in different species. With the availability of low-cost genotyping it is now possible to study epistasis on a genome-wide scale. However, identifying genome-wide epistasis is a high-dimensional multiple regression problem and needs the application of dimensionality reduction techniques. Flowering Time (FT) in crops is a complex trait that is known to be influenced by many interacting genes and pathways in various crops. In this study, we successfully apply Sure Independence Screening (SIS) for dimensionality reduction to identify two-way and three-way epistasis for the FT trait in a Multiparent Advanced Generation Inter-Cross (MAGIC) barley population using the Bayesian multilocus model. The MAGIC barley population was generated from intercrossing among eight parental lines and thus, offered greater genetic diversity to detect higher-order epistatic interactions. Our results suggest that SIS is an efficient dimensionality reduction approach to detect high-order interactions in a Bayesian multilocus model. We also observe that many of our findings (genomic regions with main or higher-order epistatic effects) overlap with known candidate genes that have been already reported in barley and closely related species for the FT trait. PMID:29254994
Magagna, Federico; Guglielmetti, Alessandro; Liberto, Erica; Reichenbach, Stephen E; Allegrucci, Elena; Gobino, Guido; Bicchi, Carlo; Cordero, Chiara
2017-08-02
This study investigates chemical information of volatile fractions of high-quality cocoa (Theobroma cacao L. Malvaceae) from different origins (Mexico, Ecuador, Venezuela, Columbia, Java, Trinidad, and Sao Tomè) produced for fine chocolate. This study explores the evolution of the entire pattern of volatiles in relation to cocoa processing (raw, roasted, steamed, and ground beans). Advanced chemical fingerprinting (e.g., combined untargeted and targeted fingerprinting) with comprehensive two-dimensional gas chromatography coupled with mass spectrometry allows advanced pattern recognition for classification, discrimination, and sensory-quality characterization. The entire data set is analyzed for 595 reliable two-dimensional peak regions, including 130 known analytes and 13 potent odorants. Multivariate analysis with unsupervised exploration (principal component analysis) and simple supervised discrimination methods (Fisher ratios and linear regression trees) reveal informative patterns of similarities and differences and identify characteristic compounds related to sample origin and manufacturing step.
Zhao, Dan; Liu, Wei; Cai, Ailu; Li, Jingyu; Chen, Lizhu; Wang, Bing
2013-02-01
The purpose of this study was to investigate the effectiveness for quantitative evaluation of cerebellar vermis using three-dimensional (3D) ultrasound and to establish a nomogram for Chinese fetal vermis measurements during gestation. Sonographic examinations were performed in normal fetuses and in cases suspected of the diagnosis of vermian rotation. 3D median planes were obtained with both OMNIVIEW and tomographic ultrasound imaging. Measurements of the cerebellar vermis were highly correlated between two-dimensional and 3D median planes. The diameter of the cerebellar vermis follows growth approximately predicted by the quadratic regression equation. The normal vermis was almost parallel to the brain stem, with the average angle degree to be <2° in normal fetuses. The average angle degree of the 9 cases of vermian rotation was >5°. Three-dimensional median planes are obtained more easily than two-dimensional ones, and allow accurate measurements of the cerebellar vermis. The 3D approach may enable rapid assessment of fetal cerebral anatomy in standard examination. Measurements of cerebellar vermis may provide a quantitative index for prenatal diagnosis of posterior fossa malformations. © 2012 John Wiley & Sons, Ltd.
MicroCT angiography detects vascular formation and regression in skin wound healing.
Urao, Norifumi; Okonkwo, Uzoagu A; Fang, Milie M; Zhuang, Zhen W; Koh, Timothy J; DiPietro, Luisa A
2016-07-01
Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to days 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. Copyright © 2016 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram
Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less
Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L
2016-11-15
Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Trans-dimensional joint inversion of seabed scattering and reflection data.
Steininger, Gavin; Dettmer, Jan; Dosso, Stan E; Holland, Charles W
2013-03-01
This paper examines joint inversion of acoustic scattering and reflection data to resolve seabed interface roughness parameters (spectral strength, exponent, and cutoff) and geoacoustic profiles. Trans-dimensional (trans-D) Bayesian sampling is applied with both the number of sediment layers and the order (zeroth or first) of auto-regressive parameters in the error model treated as unknowns. A prior distribution that allows fluid sediment layers over an elastic basement in a trans-D inversion is derived and implemented. Three cases are considered: Scattering-only inversion, joint scattering and reflection inversion, and joint inversion with the trans-D auto-regressive error model. Including reflection data improves the resolution of scattering and geoacoustic parameters. The trans-D auto-regressive model further improves scattering resolution and correctly differentiates between strongly and weakly correlated residual errors.
Concentration data and dimensionality in groundwater models: evaluation using inverse modelling
Barlebo, H.C.; Hill, M.C.; Rosbjerg, D.; Jensen, K.H.
1998-01-01
A three-dimensional inverse groundwater flow and transport model that fits hydraulic-head and concentration data simultaneously using nonlinear regression is presented and applied to a layered sand and silt groundwater system beneath the Grindsted Landfill in Denmark. The aquifer is composed of rather homogeneous hydrogeologic layers. Two issues common to groundwater flow and transport modelling are investigated: 1) The accuracy of simulated concentrations in the case of calibration with head data alone; and 2) The advantages and disadvantages of using a two-dimensional cross-sectional model instead of a three-dimensional model to simulate contaminant transport when the source is at the land surface. Results show that using only hydraulic heads in the nonlinear regression produces a simulated plume that is profoundly different from what is obtained in a calibration using both hydraulic-head and concentration data. The present study provides a well-documented example of the differences that can occur. Representing the system as a two-dimensional cross-section obviously omits some of the system dynamics. It was, however, possible to obtain a simulated plume cross-section that matched the actual plume cross-section well. The two-dimensional model execution times were about a seventh of those for the three-dimensional model, but some difficulties were encountered in representing the spatially variable source concentrations and less precise simulated concentrations were calculated by the two-dimensional model compared to the three-dimensional model. Summed up, the present study indicates that three dimensional modelling using both hydraulic heads and concentrations in the calibration should be preferred in the considered type of transport studies.
Decomposition and model selection for large contingency tables.
Dahinden, Corinne; Kalisch, Markus; Bühlmann, Peter
2010-04-01
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
Liu, Xingguo; Niu, Jianwei; Ran, Linghua; Liu, Taijie
2017-08-01
This study aimed to develop estimation formulae for the total human body volume (BV) of adult males using anthropometric measurements based on a three-dimensional (3D) scanning technique. Noninvasive and reliable methods to predict the total BV from anthropometric measurements based on a 3D scan technique were addressed in detail. A regression analysis of BV based on four key measurements was conducted for approximately 160 adult male subjects. Eight total models of human BV show that the predicted results fitted by the regression models were highly correlated with the actual BV (p < 0.001). Two metrics, the mean value of the absolute difference between the actual and predicted BV (V error ) and the mean value of the ratio between V error and actual BV (RV error ), were calculated. The linear model based on human weight was recommended as the most optimal due to its simplicity and high efficiency. The proposed estimation formulae are valuable for estimating total body volume in circumstances in which traditional underwater weighing or air displacement plethysmography is not applicable or accessible. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266.
Wang, Bing; Shen, Hao; Fang, Aiqin; Huang, De-Shuang; Jiang, Changjun; Zhang, Jun; Chen, Peng
2016-06-17
Comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC/TOF-MS) system has become a key analytical technology in high-throughput analysis. Retention index has been approved to be helpful for compound identification in one-dimensional gas chromatography, which is also true for two-dimensional gas chromatography. In this work, a novel regression model was proposed for calculating the second dimension retention index of target components where n-alkanes were used as reference compounds. This model was developed to depict the relationship among adjusted second dimension retention time, temperature of the second dimension column and carbon number of n-alkanes by an exponential nonlinear function with only five parameters. Three different criteria were introduced to find the optimal values of parameters. The performance of this model was evaluated using experimental data of n-alkanes (C7-C31) at 24 temperatures which can cover all 0-6s adjusted retention time area. The experimental results show that the mean relative error between predicted adjusted retention time and experimental data of n-alkanes was only 2%. Furthermore, our proposed model demonstrates a good extrapolation capability for predicting adjusted retention time of target compounds which located out of the range of the reference compounds in the second dimension adjusted retention time space. Our work shows the deviation was less than 9 retention index units (iu) while the number of alkanes were added up to 5. The performance of our proposed model has also been demonstrated by analyzing a mixture of compounds in temperature programmed experiments. Copyright © 2016 Elsevier B.V. All rights reserved.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
Casanova, Ramon; Saldana, Santiago; Simpson, Sean L.; Lacy, Mary E.; Subauste, Angela R.; Blackshear, Chad; Wagenknecht, Lynne; Bertoni, Alain G.
2016-01-01
Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C, fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data. PMID:27727289
Hyperspectral face recognition with spatiospectral information fusion and PLS regression.
Uzair, Muhammad; Mahmood, Arif; Mian, Ajmal
2015-03-01
Hyperspectral imaging offers new opportunities for face recognition via improved discrimination along the spectral dimension. However, it poses new challenges, including low signal-to-noise ratio, interband misalignment, and high data dimensionality. Due to these challenges, the literature on hyperspectral face recognition is not only sparse but is limited to ad hoc dimensionality reduction techniques and lacks comprehensive evaluation. We propose a hyperspectral face recognition algorithm using a spatiospectral covariance for band fusion and partial least square regression for classification. Moreover, we extend 13 existing face recognition techniques, for the first time, to perform hyperspectral face recognition.We formulate hyperspectral face recognition as an image-set classification problem and evaluate the performance of seven state-of-the-art image-set classification techniques. We also test six state-of-the-art grayscale and RGB (color) face recognition algorithms after applying fusion techniques on hyperspectral images. Comparison with the 13 extended and five existing hyperspectral face recognition techniques on three standard data sets show that the proposed algorithm outperforms all by a significant margin. Finally, we perform band selection experiments to find the most discriminative bands in the visible and near infrared response spectrum.
Simultaneous grouping pursuit and feature selection over an undirected graph*
Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei
2013-01-01
Summary In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least squares estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph. PMID:24098061
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi
2016-06-01
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.
Locally Linear Embedding of Local Orthogonal Least Squares Images for Face Recognition
NASA Astrophysics Data System (ADS)
Hafizhelmi Kamaru Zaman, Fadhlan
2018-03-01
Dimensionality reduction is very important in face recognition since it ensures that high-dimensionality data can be mapped to lower dimensional space without losing salient and integral facial information. Locally Linear Embedding (LLE) has been previously used to serve this purpose, however, the process of acquiring LLE features requires high computation and resources. To overcome this limitation, we propose a locally-applied Local Orthogonal Least Squares (LOLS) model can be used as initial feature extraction before the application of LLE. By construction of least squares regression under orthogonal constraints we can preserve more discriminant information in the local subspace of facial features while reducing the overall features into a more compact form that we called LOLS images. LLE can then be applied on the LOLS images to maps its representation into a global coordinate system of much lower dimensionality. Several experiments carried out using publicly available face datasets such as AR, ORL, YaleB, and FERET under Single Sample Per Person (SSPP) constraint demonstrates that our proposed method can reduce the time required to compute LLE features while delivering better accuracy when compared to when either LLE or OLS alone is used. Comparison against several other feature extraction methods and more recent feature-learning method such as state-of-the-art Convolutional Neural Networks (CNN) also reveal the superiority of the proposed method under SSPP constraint.
One-dimensional statistical parametric mapping in Python.
Pataky, Todd C
2012-01-01
Statistical parametric mapping (SPM) is a topological methodology for detecting field changes in smooth n-dimensional continua. Many classes of biomechanical data are smooth and contained within discrete bounds and as such are well suited to SPM analyses. The current paper accompanies release of 'SPM1D', a free and open-source Python package for conducting SPM analyses on a set of registered 1D curves. Three example applications are presented: (i) kinematics, (ii) ground reaction forces and (iii) contact pressure distribution in probabilistic finite element modelling. In addition to offering a high-level interface to a variety of common statistical tests like t tests, regression and ANOVA, SPM1D also emphasises fundamental concepts of SPM theory through stand-alone example scripts. Source code and documentation are available at: www.tpataky.net/spm1d/.
A Review on Dimension Reduction
Ma, Yanyuan; Zhu, Liping
2013-01-01
Summary Summarizing the effect of many covariates through a few linear combinations is an effective way of reducing covariate dimension and is the backbone of (sufficient) dimension reduction. Because the replacement of high-dimensional covariates by low-dimensional linear combinations is performed with a minimum assumption on the specific regression form, it enjoys attractive advantages as well as encounters unique challenges in comparison with the variable selection approach. We review the current literature of dimension reduction with an emphasis on the two most popular models, where the dimension reduction affects the conditional distribution and the conditional mean, respectively. We discuss various estimation and inference procedures in different levels of detail, with the intention of focusing on their underneath idea instead of technicalities. We also discuss some unsolved problems in this area for potential future research. PMID:23794782
Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro
2017-09-01
To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
NASA Astrophysics Data System (ADS)
Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina
2014-03-01
We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.
National scale biomass estimators for United States tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2003-01-01
Estimates of national-scale forest carbon (C) stocks and fluxes are typically based on allometric regression equations developed using dimensional analysis techniques. However, the literature is inconsistent and incomplete with respect to large-scale forest C estimation. We compiled all available diameter-based allometric regression equations for estimating total...
Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro; Abgrall, Remi
2014-11-01
Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.
Method for enhanced accuracy in predicting peptides using liquid separations or chromatography
Kangas, Lars J.; Auberry, Kenneth J.; Anderson, Gordon A.; Smith, Richard D.
2006-11-14
A method for predicting the elution time of a peptide in chromatographic and electrophoretic separations by first providing a data set of known elution times of known peptides, then creating a plurality of vectors, each vector having a plurality of dimensions, and each dimension representing the elution time of amino acids present in each of these known peptides from the data set. The elution time of any protein is then be predicted by first creating a vector by assigning dimensional values for the elution time of amino acids of at least one hypothetical peptide and then calculating a predicted elution time for the vector by performing a multivariate regression of the dimensional values of the hypothetical peptide using the dimensional values of the known peptides. Preferably, the multivariate regression is accomplished by the use of an artificial neural network and the elution times are first normalized using a transfer function.
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.
Haoliang Yuan; Yuan Yan Tang
2017-04-01
Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
NASA Astrophysics Data System (ADS)
Deris, A. M.; Zain, A. M.; Sallehuddin, R.; Sharif, S.
2017-09-01
Electric discharge machine (EDM) is one of the widely used nonconventional machining processes for hard and difficult to machine materials. Due to the large number of machining parameters in EDM and its complicated structural, the selection of the optimal solution of machining parameters for obtaining minimum machining performance is remain as a challenging task to the researchers. This paper proposed experimental investigation and optimization of machining parameters for EDM process on stainless steel 316L work piece using Harmony Search (HS) algorithm. The mathematical model was developed based on regression approach with four input parameters which are pulse on time, peak current, servo voltage and servo speed to the output response which is dimensional accuracy (DA). The optimal result of HS approach was compared with regression analysis and it was found HS gave better result y giving the most minimum DA value compared with regression approach.
NASA Astrophysics Data System (ADS)
Storm, Emma; Weniger, Christoph; Calore, Francesca
2017-08-01
We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.
Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.
2015-01-01
Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220
Numerical investigation on the regression rate of hybrid rocket motor with star swirl fuel grain
NASA Astrophysics Data System (ADS)
Zhang, Shuai; Hu, Fan; Zhang, Weihua
2016-10-01
Although hybrid rocket motor is prospected to have distinct advantages over liquid and solid rocket motor, low regression rate and insufficient efficiency are two major disadvantages which have prevented it from being commercially viable. In recent years, complex fuel grain configurations are attractive in overcoming the disadvantages with the help of Rapid Prototyping technology. In this work, an attempt has been made to numerically investigate the flow field characteristics and local regression rate distribution inside the hybrid rocket motor with complex star swirl grain. A propellant combination with GOX and HTPB has been chosen. The numerical model is established based on the three dimensional Navier-Stokes equations with turbulence, combustion, and coupled gas/solid phase formulations. The calculated fuel regression rate is compared with the experimental data to validate the accuracy of numerical model. The results indicate that, comparing the star swirl grain with the tube grain under the conditions of the same port area and the same grain length, the burning surface area rises about 200%, the spatially averaged regression rate rises as high as about 60%, and the oxidizer can combust sufficiently due to the big vortex around the axis in the aft-mixing chamber. The combustion efficiency of star swirl grain is better and more stable than that of tube grain.
NASA Astrophysics Data System (ADS)
Qianxiang, Zhou
2012-07-01
It is very important to clarify the geometric characteristic of human body segment and constitute analysis model for ergonomic design and the application of ergonomic virtual human. The typical anthropometric data of 1122 Chinese men aged 20-35 years were collected using three-dimensional laser scanner for human body. According to the correlation between different parameters, curve fitting were made between seven trunk parameters and ten body parameters with the SPSS 16.0 software. It can be concluded that hip circumference and shoulder breadth are the most important parameters in the models and the two parameters have high correlation with the others parameters of human body. By comparison with the conventional regressive curves, the present regression equation with the seven trunk parameters is more accurate to forecast the geometric dimensions of head, neck, height and the four limbs with high precision. Therefore, it is greatly valuable for ergonomic design and analysis of man-machine system.This result will be very useful to astronaut body model analysis and application.
Penalized gaussian process regression and classification for high-dimensional nonlinear data.
Yi, G; Shi, J Q; Choi, T
2011-12-01
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.
Muradian, Kh K; Utko, N O; Mozzhukhina, T H; Pishel', I M; Litoshenko, O Ia; Bezrukov, V V; Fraĭfel'd, V E
2002-01-01
Correlative and regressive relations between the gaseous exchange, thermoregulation and mitochondrial protein content were analyzed by two- and three-dimensional statistics in mice. It has been shown that the pair wise linear methods of analysis did not reveal any significant correlation between the parameters under exploration. However, it became evident at three-dimensional and non-linear plotting for which the coefficients of multivariable correlation reached and even exceeded 0.7-0.8. The calculations based on partial differentiation of the multivariable regression equations allow to conclude that at certain values of VO2, VCO2 and body temperature negative relations between the systems of gaseous exchange and thermoregulation become dominating.
ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL
Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui
2013-01-01
We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities. PMID:24086091
ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL.
Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui
2013-06-01
We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities.
Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.
Ginsburg, Shoshana; Ali, Sahirzeeshan; Lee, George; Basavanhally, Ajay; Madabhushi, Anant
2013-01-01
Quantitative histomorphometry is the process of modeling appearance of disease morphology on digitized histopathology images via image-based features (e.g., texture, graphs). Due to the curse of dimensionality, building classifiers with large numbers of features requires feature selection (which may require a large training set) or dimensionality reduction (DR). DR methods map the original high-dimensional features in terms of eigenvectors and eigenvalues, which limits the potential for feature transparency or interpretability. Although methods exist for variable selection and ranking on embeddings obtained via linear DR schemes (e.g., principal components analysis (PCA)), similar methods do not yet exist for nonlinear DR (NLDR) methods. In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.
Active Learning to Understand Infectious Disease Models and Improve Policy Making
Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-01-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387
Active learning to understand infectious disease models and improve policy making.
Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-04-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep ensemble learning of sparse regression models for brain disease diagnosis
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2018-01-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394
Liu, Bing-Chun; Binaykia, Arihant; Chang, Pei-Chann; Tiwari, Manoj Kumar; Tsao, Cheng-Chin
2017-01-01
Today, China is facing a very serious issue of Air Pollution due to its dreadful impact on the human health as well as the environment. The urban cities in China are the most affected due to their rapid industrial and economic growth. Therefore, it is of extreme importance to come up with new, better and more reliable forecasting models to accurately predict the air quality. This paper selected Beijing, Tianjin and Shijiazhuang as three cities from the Jingjinji Region for the study to come up with a new model of collaborative forecasting using Support Vector Regression (SVR) for Urban Air Quality Index (AQI) prediction in China. The present study is aimed to improve the forecasting results by minimizing the prediction error of present machine learning algorithms by taking into account multiple city multi-dimensional air quality information and weather conditions as input. The results show that there is a decrease in MAPE in case of multiple city multi-dimensional regression when there is a strong interaction and correlation of the air quality characteristic attributes with AQI. Also, the geographical location is found to play a significant role in Beijing, Tianjin and Shijiazhuang AQI prediction. PMID:28708836
Complete regression of myocardial involvement associated with lymphoma following chemotherapy.
Vinicki, Juan Pablo; Cianciulli, Tomás F; Farace, Gustavo A; Saccheri, María C; Lax, Jorge A; Kazelian, Lucía R; Wachs, Adolfo
2013-09-26
Cardiac involvement as an initial presentation of malignant lymphoma is a rare occurrence. We describe the case of a 26 year old man who had initially been diagnosed with myocardial infiltration on an echocardiogram, presenting with a testicular mass and unilateral peripheral facial paralysis. On admission, electrocardiograms (ECG) revealed negative T-waves in all leads and ST-segment elevation in the inferior leads. On two-dimensional echocardiography, there was infiltration of the pericardium with mild effusion, infiltrative thickening of the aortic walls, both atria and the interatrial septum and a mildly depressed systolic function of both ventricles. An axillary biopsy was performed and reported as a T-cell lymphoblastic lymphoma (T-LBL). Following the diagnosis and staging, chemotherapy was started. Twenty-two days after finishing the first cycle of chemotherapy, the ECG showed regression of T-wave changes in all leads and normalization of the ST-segment elevation in the inferior leads. A follow-up Two-dimensional echocardiography confirmed regression of the myocardial infiltration. This case report illustrates a lymphoma presenting with testicular mass, unilateral peripheral facial paralysis and myocardial involvement, and demonstrates that regression of infiltration can be achieved by intensive chemotherapy treatment. To our knowledge, there are no reported cases of T-LBL presenting as a testicular mass and unilateral peripheral facial paralysis, with complete regression of myocardial involvement.
REGULARIZATION FOR COX’S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY*
Fan, Jianqing; Jiang, Jiancheng
2011-01-01
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of non-concave penalized methods for non-polynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that non-concave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the non-concave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples. PMID:23066171
REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY.
Bradic, Jelena; Fan, Jianqing; Jiang, Jiancheng
2011-01-01
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of non-concave penalized methods for non-polynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that non-concave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the non-concave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis
Gong, Xiajing; Hu, Meng
2018-01-01
Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
Feature Grouping and Selection Over an Undirected Graph.
Yang, Sen; Yuan, Lei; Lai, Ying-Cheng; Shen, Xiaotong; Wonka, Peter; Ye, Jieping
2012-01-01
High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise l ∞ norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated l 1 regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.
GPU-accelerated Kernel Regression Reconstruction for Freehand 3D Ultrasound Imaging.
Wen, Tiexiang; Li, Ling; Zhu, Qingsong; Qin, Wenjian; Gu, Jia; Yang, Feng; Xie, Yaoqin
2017-07-01
Volume reconstruction method plays an important role in improving reconstructed volumetric image quality for freehand three-dimensional (3D) ultrasound imaging. By utilizing the capability of programmable graphics processing unit (GPU), we can achieve a real-time incremental volume reconstruction at a speed of 25-50 frames per second (fps). After incremental reconstruction and visualization, hole-filling is performed on GPU to fill remaining empty voxels. However, traditional pixel nearest neighbor-based hole-filling fails to reconstruct volume with high image quality. On the contrary, the kernel regression provides an accurate volume reconstruction method for 3D ultrasound imaging but with the cost of heavy computational complexity. In this paper, a GPU-based fast kernel regression method is proposed for high-quality volume after the incremental reconstruction of freehand ultrasound. The experimental results show that improved image quality for speckle reduction and details preservation can be obtained with the parameter setting of kernel window size of [Formula: see text] and kernel bandwidth of 1.0. The computational performance of the proposed GPU-based method can be over 200 times faster than that on central processing unit (CPU), and the volume with size of 50 million voxels in our experiment can be reconstructed within 10 seconds.
Fabric pilling measurement using three-dimensional image
NASA Astrophysics Data System (ADS)
Ouyang, Wenbin; Wang, Rongwu; Xu, Bugao
2013-10-01
We introduce a stereovision system and the three-dimensional (3-D) image analysis algorithms for fabric pilling measurement. Based on the depth information available in the 3-D image, the pilling detection process starts from the seed searching at local depth maxima to the region growing around the selected seeds using both depth and distance criteria. After the pilling detection, the density, height, and area of individual pills in the image can be extracted to describe the pilling appearance. According to the multivariate regression analysis on the 3-D images of 30 cotton fabrics treated by the random-tumble and home-laundering machines, the pilling grade is highly correlated with the pilling density (R=0.923) but does not consistently change with the pilling height and area. The pilling densities measured from the 3-D images also correlate well with those counted manually from the samples (R=0.985).
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-06-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
Clifford support vector machines for classification, regression, and recurrence.
Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy
2010-11-01
This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-01-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
Extrinsic local regression on manifold-valued data
Lin, Lizhen; St Thomas, Brian; Zhu, Hongtu; Dunson, David B.
2017-01-01
We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach. PMID:29225385
NASA Astrophysics Data System (ADS)
Tripathy, Rohit; Bilionis, Ilias; Gonzalez, Marcial
2016-09-01
Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range of physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the model, we design a two-step maximum likelihood optimization procedure that ensures the orthogonality of the projection matrix by exploiting recent results on the Stiefel manifold, i.e., the manifold of matrices with orthogonal columns. The additional benefit of our probabilistic formulation, is that it allows us to select the dimensionality of the AS via the Bayesian information criterion. We validate our approach by showing that it can discover the right AS in synthetic examples without gradient information using both noiseless and noisy observations. We demonstrate that our method is able to discover the same AS as the classical approach in a challenging one-hundred-dimensional problem involving an elliptic stochastic partial differential equation with random conductivity. Finally, we use our approach to study the effect of geometric and material uncertainties in the propagation of solitary waves in a one dimensional granular system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tripathy, Rohit, E-mail: rtripath@purdue.edu; Bilionis, Ilias, E-mail: ibilion@purdue.edu; Gonzalez, Marcial, E-mail: marcial-gonzalez@purdue.edu
2016-09-15
Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range ofmore » physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the model, we design a two-step maximum likelihood optimization procedure that ensures the orthogonality of the projection matrix by exploiting recent results on the Stiefel manifold, i.e., the manifold of matrices with orthogonal columns. The additional benefit of our probabilistic formulation, is that it allows us to select the dimensionality of the AS via the Bayesian information criterion. We validate our approach by showing that it can discover the right AS in synthetic examples without gradient information using both noiseless and noisy observations. We demonstrate that our method is able to discover the same AS as the classical approach in a challenging one-hundred-dimensional problem involving an elliptic stochastic partial differential equation with random conductivity. Finally, we use our approach to study the effect of geometric and material uncertainties in the propagation of solitary waves in a one dimensional granular system.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Storm, Emma; Weniger, Christoph; Calore, Francesca, E-mail: e.m.storm@uva.nl, E-mail: c.weniger@uva.nl, E-mail: francesca.calore@lapth.cnrs.fr
We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (∼> 10{sup 5}) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that aremore » motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |ℓ|<90{sup o} and | b |<20{sup o}, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.« less
Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease
Liu, Hongcheng; Du, Guangwei; Zhang, Lijun; Lewis, Mechelle M.; Wang, Xue; Yao, Tao; Li, Runze; Huang, Xuemei
2016-01-01
Background Brain MRI holds promise to gauge different aspects of Parkinson’s disease (PD)-related pathological changes. Its analysis, however, is hindered by the high-dimensional nature of the data. New method This study introduces folded concave penalized (FCP) sparse logistic regression to identify biomarkers for PD from a large number of potential factors. The proposed statistical procedures target the challenges of high-dimensionality with limited data samples acquired. The maximization problem associated with the sparse logistic regression model is solved by local linear approximation. The proposed procedures then are applied to the empirical analysis of multimodal MRI data. Results From 45 features, the proposed approach identified 15 MRI markers and the UPSIT, which are known to be clinically relevant to PD. By combining the MRI and clinical markers, we can enhance substantially the specificity and sensitivity of the model, as indicated by the ROC curves. Comparison to existing methods We compare the folded concave penalized learning scheme with both the Lasso penalized scheme and the principle component analysis-based feature selection (PCA) in the Parkinson’s biomarker identification problem that takes into account both the clinical features and MRI markers. The folded concave penalty method demonstrates a substantially better clinical potential than both the Lasso and PCA in terms of specificity and sensitivity. Conclusions For the first time, we applied the FCP learning method to MRI biomarker discovery in PD. The proposed approach successfully identified MRI markers that are clinically relevant. Combining these biomarkers with clinical features can substantially enhance performance. PMID:27102045
Hu, Chen; Steingrimsson, Jon Arni
2018-01-01
A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Pacharawongsakda, Eakasit; Theeramunkong, Thanaruk
2013-12-01
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.
NASA Astrophysics Data System (ADS)
Chang, Fu-Yu; Chen, Ming-Kun; Wang, Min-Haw; Jang, Ling-Sheng
2016-02-01
Cell impedance analysis is widely used for monitoring biological and medical reactions. In this study, a highly sensitive three-dimensional (3D) interdigitated microelectrode (IME) with a high aspect ratio on a polyimide (PI) flexible substrate was fabricated for microparticle detection (e.g. cell quantity detection) using electroforming and lithography technology. 3D finite element simulations were performed to compare the performance of the 3D IME (in terms of sensitivity and signal-to-noise ratio) to that of a planar IME for particles in the sensing area. Various quantities of particles were captured in Dulbecco’s modified Eagle medium and their impedances were measured. With the 3D IME, the particles were arranged in the gap, not on the electrode, avoiding the noise due to particle position. For the maximum particle quantities, the results show that the 3D IME has at least 5-fold higher sensitivity than that of the planar IME. The trends of impedance magnitude and phase due to particle quantity were verified using the equivalent circuit model. The impedance (1269 Ω) of 69 particles was used to estimate the particle quantity (68 particles) with 98.6% accuracy using a parabolic regression curve at 500 kHz.
Local connectome phenotypes predict social, health, and cognitive factors
Powell, Michael A.; Garcia, Javier O.; Yeh, Fang-Cheng; Vettel, Jean M.
2018-01-01
The unique architecture of the human connectome is defined initially by genetics and subsequently sculpted over time with experience. Thus, similarities in predisposition and experience that lead to similarities in social, biological, and cognitive attributes should also be reflected in the local architecture of white matter fascicles. Here we employ a method known as local connectome fingerprinting that uses diffusion MRI to measure the fiber-wise characteristics of macroscopic white matter pathways throughout the brain. This fingerprinting approach was applied to a large sample (N = 841) of subjects from the Human Connectome Project, revealing a reliable degree of between-subject correlation in the local connectome fingerprints, with a relatively complex, low-dimensional substructure. Using a cross-validated, high-dimensional regression analysis approach, we derived local connectome phenotype (LCP) maps that could reliably predict a subset of subject attributes measured, including demographic, health, and cognitive measures. These LCP maps were highly specific to the attribute being predicted but also sensitive to correlations between attributes. Collectively, these results indicate that the local architecture of white matter fascicles reflects a meaningful portion of the variability shared between subjects along several dimensions. PMID:29911679
Local connectome phenotypes predict social, health, and cognitive factors.
Powell, Michael A; Garcia, Javier O; Yeh, Fang-Cheng; Vettel, Jean M; Verstynen, Timothy
2018-01-01
The unique architecture of the human connectome is defined initially by genetics and subsequently sculpted over time with experience. Thus, similarities in predisposition and experience that lead to similarities in social, biological, and cognitive attributes should also be reflected in the local architecture of white matter fascicles. Here we employ a method known as local connectome fingerprinting that uses diffusion MRI to measure the fiber-wise characteristics of macroscopic white matter pathways throughout the brain. This fingerprinting approach was applied to a large sample ( N = 841) of subjects from the Human Connectome Project, revealing a reliable degree of between-subject correlation in the local connectome fingerprints, with a relatively complex, low-dimensional substructure. Using a cross-validated, high-dimensional regression analysis approach, we derived local connectome phenotype (LCP) maps that could reliably predict a subset of subject attributes measured, including demographic, health, and cognitive measures. These LCP maps were highly specific to the attribute being predicted but also sensitive to correlations between attributes. Collectively, these results indicate that the local architecture of white matter fascicles reflects a meaningful portion of the variability shared between subjects along several dimensions.
Linear Regression Links Transcriptomic Data and Cellular Raman Spectra.
Kobayashi-Kirschvink, Koseki J; Nakaoka, Hidenori; Oda, Arisa; Kamei, Ken-Ichiro F; Nosho, Kazuki; Fukushima, Hiroko; Kanesaki, Yu; Yajima, Shunsuke; Masaki, Haruhiko; Ohta, Kunihiro; Wakamoto, Yuichi
2018-06-08
Raman microscopy is an imaging technique that has been applied to assess molecular compositions of living cells to characterize cell types and states. However, owing to the diverse molecular species in cells and challenges of assigning peaks to specific molecules, it has not been clear how to interpret cellular Raman spectra. Here, we provide firm evidence that cellular Raman spectra and transcriptomic profiles of Schizosaccharomyces pombe and Escherichia coli can be computationally connected and thus interpreted. We find that the dimensions of high-dimensional Raman spectra and transcriptomes measured by RNA sequencing can be reduced and connected linearly through a shared low-dimensional subspace. Accordingly, we were able to predict global gene expression profiles by applying the calculated transformation matrix to Raman spectra, and vice versa. Highly expressed non-coding RNAs contributed to the Raman-transcriptome linear correspondence more significantly than mRNAs in S. pombe. This demonstration of correspondence between cellular Raman spectra and transcriptomes is a promising step toward establishing spectroscopic live-cell omics studies. Copyright © 2018 Elsevier Inc. All rights reserved.
On the degrees of freedom of reduced-rank estimators in multivariate regression
Mukherjee, A.; Chen, K.; Wang, N.; Zhu, J.
2015-01-01
Summary We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example. PMID:26702155
Functional CAR models for large spatially correlated functional datasets.
Zhang, Lin; Baladandayuthapani, Veerabhadran; Zhu, Hongxiao; Baggerly, Keith A; Majewski, Tadeusz; Czerniak, Bogdan A; Morris, Jeffrey S
2016-01-01
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
Bayesian function-on-function regression for multilevel functional data.
Meyer, Mark J; Coull, Brent A; Versace, Francesco; Cinciripini, Paul; Morris, Jeffrey S
2015-09-01
Medical and public health research increasingly involves the collection of complex and high dimensional data. In particular, functional data-where the unit of observation is a curve or set of curves that are finely sampled over a grid-is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images. © 2015, The International Biometric Society.
Complete regression of myocardial involvement associated with lymphoma following chemotherapy
Vinicki, Juan Pablo; Cianciulli, Tomás F; Farace, Gustavo A; Saccheri, María C; Lax, Jorge A; Kazelian, Lucía R; Wachs, Adolfo
2013-01-01
Cardiac involvement as an initial presentation of malignant lymphoma is a rare occurrence. We describe the case of a 26 year old man who had initially been diagnosed with myocardial infiltration on an echocardiogram, presenting with a testicular mass and unilateral peripheral facial paralysis. On admission, electrocardiograms (ECG) revealed negative T-waves in all leads and ST-segment elevation in the inferior leads. On two-dimensional echocardiography, there was infiltration of the pericardium with mild effusion, infiltrative thickening of the aortic walls, both atria and the interatrial septum and a mildly depressed systolic function of both ventricles. An axillary biopsy was performed and reported as a T-cell lymphoblastic lymphoma (T-LBL). Following the diagnosis and staging, chemotherapy was started. Twenty-two days after finishing the first cycle of chemotherapy, the ECG showed regression of T-wave changes in all leads and normalization of the ST-segment elevation in the inferior leads. A follow-up Two-dimensional echocardiography confirmed regression of the myocardial infiltration. This case report illustrates a lymphoma presenting with testicular mass, unilateral peripheral facial paralysis and myocardial involvement, and demonstrates that regression of infiltration can be achieved by intensive chemotherapy treatment. To our knowledge, there are no reported cases of T-LBL presenting as a testicular mass and unilateral peripheral facial paralysis, with complete regression of myocardial involvement. PMID:24109501
Karakostis, Fotios Alexandros; Hotz, Gerhard; Scherf, Heike; Wahl, Joachim; Harvati, Katerina
2018-05-01
The purpose of this study was to put forth a precise landmark-based technique for reconstructing the three-dimensional shape of human entheseal surfaces, to investigate whether the shape of human entheses is related to their size. The effects of age-at-death and bone length on entheseal shapes were also assessed. The sample comprised high-definition three-dimensional models of three right hand entheseal surfaces, which correspond to 45 male adult individuals of known age. For each enthesis, a particular landmark configuration was introduced, whose precision was tested both within and between observers. The effect of three-dimensional size, age-at-death, and bone length on shape was investigated through shape regression. The method presented high intra-observer and inter-observer repeatability. All entheses showed significant allometry, with the area of opponens pollicis demonstrating the most substantial relationship. This was particularly due to variation related to its proximal elongated ridge. The effect of age-at-death and bone length on entheses was limited. The introduced methodology can set a reliable basis for further research on the factors affecting entheseal shape. Using both size and shape, variables can provide further information on entheseal variation and its biomechanical implications. The low entheseal variation by age verifies that specimens under 50 years of age are not substantially affected by age-related changes. The lack of correlation between entheseal shape and bone length or age implies that other factors may regulate entheseal surfaces. Future research should focus on multivariate shape patterns among entheses and their association with occupation. © 2018 Wiley Periodicals, Inc.
Dietrich, Stefan; Floegel, Anna; Troll, Martina; Kühn, Tilman; Rathmann, Wolfgang; Peters, Anette; Sookthai, Disorn; von Bergen, Martin; Kaaks, Rudolf; Adamski, Jerzy; Prehn, Cornelia; Boeing, Heiner; Schulze, Matthias B; Illig, Thomas; Pischon, Tobias; Knüppel, Sven; Wang-Sattler, Rui; Drogan, Dagmar
2016-10-01
The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.
NASA Astrophysics Data System (ADS)
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
2017-12-01
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Multiview alignment hashing for efficient image search.
Liu, Li; Yu, Mengyang; Shao, Ling
2015-03-01
Hashing is a popular and efficient method for nearest neighbor search in large-scale data spaces by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. For most hashing methods, the performance of retrieval heavily depends on the choice of the high-dimensional feature descriptor. Furthermore, a single type of feature cannot be descriptive enough for different images when it is used for hashing. Thus, how to combine multiple representations for learning effective hashing functions is an imminent task. In this paper, we present a novel unsupervised multiview alignment hashing approach based on regularized kernel nonnegative matrix factorization, which can find a compact representation uncovering the hidden semantics and simultaneously respecting the joint probability distribution of data. In particular, we aim to seek a matrix factorization to effectively fuse the multiple information sources meanwhile discarding the feature redundancy. Since the raised problem is regarded as nonconvex and discrete, our objective function is then optimized via an alternate way with relaxation and converges to a locally optimal solution. After finding the low-dimensional representation, the hashing functions are finally obtained through multivariable logistic regression. The proposed method is systematically evaluated on three data sets: 1) Caltech-256; 2) CIFAR-10; and 3) CIFAR-20, and the results show that our method significantly outperforms the state-of-the-art multiview hashing techniques.
NASA Astrophysics Data System (ADS)
Yeom, Hong Gi; Sic Kim, June; Chung, Chun Kee
2013-04-01
Objective. Studies on the non-invasive brain-machine interface that controls prosthetic devices via movement intentions are at their very early stages. Here, we aimed to estimate three-dimensional arm movements using magnetoencephalography (MEG) signals with high accuracy. Approach. Whole-head MEG signals were acquired during three-dimensional reaching movements (center-out paradigm). For movement decoding, we selected 68 MEG channels in motor-related areas, which were band-pass filtered using four subfrequency bands (0.5-8, 9-22, 25-40 and 57-97 Hz). After the filtering, the signals were resampled, and 11 data points preceding the current data point were used as features for estimating velocity. Multiple linear regressions were used to estimate movement velocities. Movement trajectories were calculated by integrating estimated velocities. We evaluated our results by calculating correlation coefficients (r) between real and estimated velocities. Main results. Movement velocities could be estimated from the low-frequency MEG signals (0.5-8 Hz) with significant and considerably high accuracy (p <0.001, mean r > 0.7). We also showed that preceding (60-140 ms) MEG signals are important to estimate current movement velocities and the intervals of brain signals of 200-300 ms are sufficient for movement estimation. Significance. These results imply that disabled people will be able to control prosthetic devices without surgery in the near future.
Al-Alawi, Mohammed; Al-Sinawi, Hamed; Al-Qubtan, Ali; Al-Lawati, Jaber; Al-Habsi, Assad; Al-Shuraiqi, Mohammed; Al-Adawi, Samir; Panchatcharam, Sathiya Murthi
2017-11-08
This study investigated the prevalence and determinants of Burnout Syndrome and Depressive Symptoms among medical students in Oman. Then, it explored whether the three-dimensional aspects of Burnout Syndrome (High Emotional Exhaustion, High Cynicism and Low Academic Efficacy) would predict the presence of Depressive Symptoms in a logistic regression model. A cross-sectional study was conducted among a random sample of medical students of Sultan Qaboos University. 662 students participated in the study with a response rate of 98%. The prevalence of Burnout Syndrome and Depressive Symptoms were; 7.4% and 24.5% respectively. Preclinical students reported high levels of both Burnout Syndrome (Odds Ratio-OR 2.83, 95% Confidence Interval CI 1.45-5.54) and Depressive Symptoms (OR 2. 72, 95% CI 1.07-6.89). The three-dimensional aspects of Burnout Syndrome(High Emotional Exhaustion, High Cynicism, low Professional efficacy) were statistically significant predictors of the presence of Depressive Symptoms; OR 3.52 (95% CI: 2.21-5.60), OR 3.33 (95% CI:2.10-5.28) and OR 2.07(95%CI:1.32-3.24) respectively. This study indicates that Burnout Syndrome and Depressive Symptoms are common among medical students, particularly in preclinical grade. Furthermore, the presence of high occupational burnout elevates the risk of depression.
Network selection, Information filtering and Scalable computation
NASA Astrophysics Data System (ADS)
Ye, Changqing
This dissertation explores two application scenarios of sparsity pursuit method on large scale data sets. The first scenario is classification and regression in analyzing high dimensional structured data, where predictors corresponds to nodes of a given directed graph. This arises in, for instance, identification of disease genes for the Parkinson's diseases from a network of candidate genes. In such a situation, directed graph describes dependencies among the genes, where direction of edges represent certain causal effects. Key to high-dimensional structured classification and regression is how to utilize dependencies among predictors as specified by directions of the graph. In this dissertation, we develop a novel method that fully takes into account such dependencies formulated through certain nonlinear constraints. We apply the proposed method to two applications, feature selection in large margin binary classification and in linear regression. We implement the proposed method through difference convex programming for the cost function and constraints. Finally, theoretical and numerical analyses suggest that the proposed method achieves the desired objectives. An application to disease gene identification is presented. The second application scenario is personalized information filtering which extracts the information specifically relevant to a user, predicting his/her preference over a large number of items, based on the opinions of users who think alike or its content. This problem is cast into the framework of regression and classification, where we introduce novel partial latent models to integrate additional user-specific and content-specific predictors, for higher predictive accuracy. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each representing a user's preference and an item preference by users. Then we propose a likelihood method to seek a sparsest latent factorization, from a class of over-complete factorizations, possibly with a high percentage of missing values. This promotes additional sparsity beyond rank reduction. Computationally, we design methods based on a ``decomposition and combination'' strategy, to break large-scale optimization into many small subproblems to solve in a recursive and parallel manner. On this basis, we implement the proposed methods through multi-platform shared-memory parallel programming, and through Mahout, a library for scalable machine learning and data mining, for mapReduce computation. For example, our methods are scalable to a dataset consisting of three billions of observations on a single machine with sufficient memory, having good timings. Both theoretical and numerical investigations show that the proposed methods exhibit significant improvement in accuracy over state-of-the-art scalable methods.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2017-04-01
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
Bradshaw, Elizabeth J; Keogh, Justin W L; Hume, Patria A; Maulder, Peter S; Nortje, Jacques; Marnewick, Michel
2009-06-01
The purpose of this study was to examine the role of neuromotor noise on golf swing performance in high- and low-handicap players. Selected two-dimensional kinematic measures of 20 male golfers (n=10 per high- or low-handicap group) performing 10 golf swings with a 5-iron club was obtained through video analysis. Neuromotor noise was calculated by deducting the standard error of the measurement from the coefficient of variation obtained from intra-individual analysis. Statistical methods included linear regression analysis and one-way analysis of variance using SPSS. Absolute invariance in the key technical positions (e.g., at the top of the backswing) of the golf swing appears to be a more favorable technique for skilled performance.
Executive function in middle childhood and the relationship with theory of mind.
Wilson, Jennifer; Andrews, Glenda; Hogan, Christy; Wang, Si; Shum, David H K
2018-01-01
A group of 126 typically developing children (aged 5-12 years) completed three cool executive function tasks (spatial working memory, stop signal, intra-extra dimensional shift), two hot executive function tasks (gambling, delay of gratification), one advanced theory of mind task (strange stories with high versus low affective tone), and a vocabulary test. Older children performed better than younger children, consistent with the protracted development of hot and cool executive functions and theory of mind. Multiple regression analyses showed that hot and cool executive functions were correlated but they predicted theory of mind in different ways.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
NASA Astrophysics Data System (ADS)
Musa, Omer; Weixuan, Li; Xiong, Chen; Lunkun, Gong; Wenhe, Liao
2018-07-01
Solid-fuel ramjet converts thermal energy of combustion products to a forward thrust without using any moving parts. Normally, it uses air intake system to compress the incoming air without swirler. A new design of swirler has been proposed and used in the current work. In this paper, a series of firing tests have been carried out to investigate the impact of using swirl flow on regression rate, combustion characteristics, and performance of solid-fuel ramjet engines. The influences of swirl intensity, solid fuel port diameter, and combustor length were studied and varied independently. A new technique for determining the time and space averaged regression rate of high-density polyethylene solid fuel surface after experiments has been proposed based on the laser scan technique. A code has been developed to reconstruct the data from the scanner and then used to obtain the three-dimensional distribution of the regression rate. It is shown that increasing swirl number increases regression rate, thrust, and characteristic velocity, and, decreases air-fuel ratio, corner recirculation zone length, and specific impulse. Using swirl flow enhances the flame stability meanwhile negatively affected on ignition process and specific impulse. Although a significant reduction of combustion chamber length can be achieved when swirl flow is used. Power fitting correlation for average regression rate was developed taking into account the influence of swirl number. Furthermore, varying port diameter and combustor length were found to have influences on regression rate, combustion characteristics and performance of solid-fuel ramjet.
Evaluation of Deep Learning Representations of Spatial Storm Data
NASA Astrophysics Data System (ADS)
Gagne, D. J., II; Haupt, S. E.; Nychka, D. W.
2017-12-01
The spatial structure of a severe thunderstorm and its surrounding environment provide useful information about the potential for severe weather hazards, including tornadoes, hail, and high winds. Statistics computed over the area of a storm or from the pre-storm environment can provide descriptive information but fail to capture structural information. Because the storm environment is a complex, high-dimensional space, identifying methods to encode important spatial storm information in a low-dimensional form should aid analysis and prediction of storms by statistical and machine learning models. Principal component analysis (PCA), a more traditional approach, transforms high-dimensional data into a set of linearly uncorrelated, orthogonal components ordered by the amount of variance explained by each component. The burgeoning field of deep learning offers two potential approaches to this problem. Convolutional Neural Networks are a supervised learning method for transforming spatial data into a hierarchical set of feature maps that correspond with relevant combinations of spatial structures in the data. Generative Adversarial Networks (GANs) are an unsupervised deep learning model that uses two neural networks trained against each other to produce encoded representations of spatial data. These different spatial encoding methods were evaluated on the prediction of severe hail for a large set of storm patches extracted from the NCAR convection-allowing ensemble. Each storm patch contains information about storm structure and the near-storm environment. Logistic regression and random forest models were trained using the PCA and GAN encodings of the storm data and were compared against the predictions from a convolutional neural network. All methods showed skill over climatology at predicting the probability of severe hail. However, the verification scores among the methods were very similar and the predictions were highly correlated. Further evaluations are being performed to determine how the choice of input variables affects the results.
Pulmonary tumor measurements from x-ray computed tomography in one, two, and three dimensions.
Villemaire, Lauren; Owrangi, Amir M; Etemad-Rezai, Roya; Wilson, Laura; O'Riordan, Elaine; Keller, Harry; Driscoll, Brandon; Bauman, Glenn; Fenster, Aaron; Parraga, Grace
2011-11-01
We evaluated the accuracy and reproducibility of three-dimensional (3D) measurements of lung phantoms and patient tumors from x-ray computed tomography (CT) and compared these to one-dimensional (1D) and two-dimensional (2D) measurements. CT images of three spherical and three irregularly shaped tumor phantoms were evaluated by three observers who performed five repeated measurements. Additionally, three observers manually segmented 29 patient lung tumors five times each. Follow-up imaging was performed for 23 tumors and response criteria were compared. For a single subject, imaging was performed on nine occasions over 2 years to evaluate multidimensional tumor response. To evaluate measurement accuracy, we compared imaging measurements to ground truth using analysis of variance. For estimates of precision, intraobserver and interobserver coefficients of variation and intraclass correlations (ICC) were used. Linear regression and Pearson correlations were used to evaluate agreement and tumor response was descriptively compared. For spherical shaped phantoms, all measurements were highly accurate, but for irregularly shaped phantoms, only 3D measurements were in high agreement with ground truth measurements. All phantom and patient measurements showed high intra- and interobserver reproducibility (ICC >0.900). Over a 2-year period for a single patient, there was disagreement between tumor response classifications based on 3D measurements and those generated using 1D and 2D measurements. Tumor volume measurements were highly reproducible and accurate for irregular, spherical phantoms and patient tumors with nonuniform dimensions. Response classifications obtained from multidimensional measurements suggest that 3D measurements provide higher sensitivity to tumor response. Copyright © 2011 AUR. Published by Elsevier Inc. All rights reserved.
High-resolution proxies for wood density variations in Terminalia superba
De Ridder, Maaike; Van den Bulcke, Jan; Vansteenkiste, Dries; Van Loo, Denis; Dierick, Manuel; Masschaele, Bert; De Witte, Yoni; Mannes, David; Lehmann, Eberhard; Beeckman, Hans; Van Hoorebeke, Luc; Van Acker, Joris
2011-01-01
Background and Aims Density is a crucial variable in forest and wood science and is evaluated by a multitude of methods. Direct gravimetric methods are mostly destructive and time-consuming. Therefore, faster and semi- to non-destructive indirect methods have been developed. Methods Profiles of wood density variations with a resolution of approx. 50 µm were derived from one-dimensional resistance drillings, two-dimensional neutron scans, and three-dimensional neutron and X-ray scans. All methods were applied on Terminalia superba Engl. & Diels, an African pioneer species which sometimes exhibits a brown heart (limba noir). Key Results The use of X-ray tomography combined with a reference material permitted direct estimates of wood density. These X-ray-derived densities overestimated gravimetrically determined densities non-significantly and showed high correlation (linear regression, R2 = 0·995). When comparing X-ray densities with the attenuation coefficients of neutron scans and the amplitude of drilling resistance, a significant linear relation was found with the neutron attenuation coefficient (R2 = 0·986) yet a weak relation with drilling resistance (R2 = 0·243). When density patterns are compared, all three methods are capable of revealing the same trends. Differences are mainly due to the orientation of tree rings and the different characteristics of the indirect methods. Conclusions High-resolution X-ray computed tomography is a promising technique for research on wood cores and will be explored further on other temperate and tropical species. Further study on limba noir is necessary to reveal the causes of density variations and to determine how resistance drillings can be further refined. PMID:21131386
Ternès, Nils; Rotolo, Federico; Michiels, Stefan
2016-07-10
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Gorban, A N; Mirkes, E M; Zinovyev, A
2016-12-01
Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0
Seo, Sam; Lee, Chong Eun; Jeong, Jae Hoon; Park, Ki Ho; Kim, Dong Myung; Jeoung, Jin Wook
2017-03-11
To determine the influences of myopia and optic disc size on ganglion cell-inner plexiform layer (GCIPL) and peripapillary retinal nerve fiber layer (RNFL) thickness profiles obtained by spectral domain optical coherence tomography (OCT). One hundred and sixty-eight eyes of 168 young myopic subjects were recruited and assigned to one of three groups according to their spherical equivalent (SE) values and optic disc area. All underwent Cirrus HD-OCT imaging. The influences of myopia and optic disc size on the GCIPL and RNFL thickness profiles were evaluated by multiple comparisons and linear regression analysis. Three-dimensional surface plots of GCIPL and RNFL thickness corresponding to different combinations of myopia and optic disc size were constructed. Each of the quadrant RNFL thicknesses and their overall average were significantly thinner in high myopia compared to low myopia, except for the temporal quadrant (all Ps ≤0.003). The average and all-sectors GCIPL were significantly thinner in high myopia than in moderate- and/or low-myopia (all Ps ≤0.002). The average OCT RNFL thickness was correlated significantly with SE (0.81 μm/diopter, P < 0.001), axial length (-1.44 μm/mm, P < 0.001), and optic disc area (5.35 μm/mm 2 , P < 0.001) by linear regression analysis. As for the OCT GCIPL parameters, average GCIPL thickness showed a significant correlation with SE (0.84 μm/diopter, P < 0.001) and axial length (-1.65 μm/mm, P < 0.001). There was no significant correlation of average GCIPL thickness with optic disc area. Three-dimensional curves showed that larger optic discs were associated with increased average RNFL thickness and that more-myopic eyes were associated with decreased average GCIPL and RNFL thickness. Myopia can significantly affect GCIPL and RNFL thickness profiles, and optic disc size has a significant influence on RNFL thickness. The current OCT maps employed in the evaluation of glaucoma should be analyzed in consideration of refractive status and optic disc size.
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.
Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A
2015-07-01
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A PDE approach for quantifying and visualizing tumor progression and regression
NASA Astrophysics Data System (ADS)
Sintay, Benjamin J.; Bourland, J. Daniel
2009-02-01
Quantification of changes in tumor shape and size allows physicians the ability to determine the effectiveness of various treatment options, adapt treatment, predict outcome, and map potential problem sites. Conventional methods are often based on metrics such as volume, diameter, or maximum cross sectional area. This work seeks to improve the visualization and analysis of tumor changes by simultaneously analyzing changes in the entire tumor volume. This method utilizes an elliptic partial differential equation (PDE) to provide a roadmap of boundary displacement that does not suffer from the discontinuities associated with other measures such as Euclidean distance. Streamline pathways defined by Laplace's equation (a commonly used PDE) are used to track tumor progression and regression at the tumor boundary. Laplace's equation is particularly useful because it provides a smooth, continuous solution that can be evaluated with sub-pixel precision on variable grid sizes. Several metrics are demonstrated including maximum, average, and total regression and progression. This method provides many advantages over conventional means of quantifying change in tumor shape because it is observer independent, stable for highly unusual geometries, and provides an analysis of the entire three-dimensional tumor volume.
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Massa, Luca; Wang, Jonathan; Freund, Jonathan B.
2018-05-01
We introduce an efficient non-intrusive surrogate-based methodology for global sensitivity analysis and uncertainty quantification. Modified covariance-based sensitivity indices (mCov-SI) are defined for outputs that reflect correlated effects. The overall approach is applied to simulations of a complex plasma-coupled combustion system with disparate uncertain parameters in sub-models for chemical kinetics and a laser-induced breakdown ignition seed. The surrogate is based on an Analysis of Variance (ANOVA) expansion, such as widely used in statistics, with orthogonal polynomials representing the ANOVA subspaces and a polynomial dimensional decomposition (PDD) representing its multi-dimensional components. The coefficients of the PDD expansion are obtained using a least-squares regression, which both avoids the direct computation of high-dimensional integrals and affords an attractive flexibility in choosing sampling points. This facilitates importance sampling using a Bayesian calibrated posterior distribution, which is fast and thus particularly advantageous in common practical cases, such as our large-scale demonstration, for which the asymptotic convergence properties of polynomial expansions cannot be realized due to computation expense. Effort, instead, is focused on efficient finite-resolution sampling. Standard covariance-based sensitivity indices (Cov-SI) are employed to account for correlation of the uncertain parameters. Magnitude of Cov-SI is unfortunately unbounded, which can produce extremely large indices that limit their utility. Alternatively, mCov-SI are then proposed in order to bound this magnitude ∈ [ 0 , 1 ]. The polynomial expansion is coupled with an adaptive ANOVA strategy to provide an accurate surrogate as the union of several low-dimensional spaces, avoiding the typical computational cost of a high-dimensional expansion. It is also adaptively simplified according to the relative contribution of the different polynomials to the total variance. The approach is demonstrated for a laser-induced turbulent combustion simulation model, which includes parameters with correlated effects.
Three-Dimensional City Determinants of the Urban Heat Island: A Statistical Approach
NASA Astrophysics Data System (ADS)
Chun, Bum Seok
There is no doubt that the Urban Heat Island (UHI) is a mounting problem in built-up environments, due to the energy retention by the surface materials of dense buildings, leading to increased temperatures, air pollution, and energy consumption. Much of the earlier research on the UHI has used two-dimensional (2-D) information, such as land uses and the distribution of vegetation. In the case of homogeneous land uses, it is possible to predict surface temperatures with reasonable accuracy with 2-D information. However, three-dimensional (3-D) information is necessary to analyze more complex sites, including dense building clusters. Recent research on the UHI has started to consider multi-dimensional models. The purpose of this research is to explore the urban determinants of the UHI, using 2-D/3-D urban information with statistical modeling. The research includes the following stages: (a) estimating urban temperature, using satellite images, (b) developing a 3-D city model by LiDAR data, (c) generating geometric parameters with regard to 2-/3-D geospatial information, and (d) conducting different statistical analyses: OLS and spatial regressions. The research area is part of the City of Columbus, Ohio. To effectively and systematically analyze the UHI, hierarchical grid scales (480m, 240m, 120m, 60m, and 30m) are proposed, together with linear and the log-linear regression models. The non-linear OLS models with Log(AST) as dependent variable have the highest R2 among all the OLS-estimated models. However, both SAR and GSM models are estimated for the 480m, 240m, 120m, and 60m grids to reduce their spatial dependency. Most GSM models have R2s higher than 0.9, except for the 240m grid. Overall, the urban characteristics having high impacts in all grids are embodied in solar radiation, 3-D open space, greenery, and water streams. These results demonstrate that it is possible to mitigate the UHI, providing guidelines for policies aiming to reduce the UHI.
Nishii, Takashi; Genkawa, Takuma; Watari, Masahiro; Ozaki, Yukihiro
2012-01-01
A new selection procedure of an informative near-infrared (NIR) region for regression model building is proposed that uses an online NIR/mid-infrared (mid-IR) dual-region spectrometer in conjunction with two-dimensional (2D) NIR/mid-IR heterospectral correlation spectroscopy. In this procedure, both NIR and mid-IR spectra of a liquid sample are acquired sequentially during a reaction process using the NIR/mid-IR dual-region spectrometer; the 2D NIR/mid-IR heterospectral correlation spectrum is subsequently calculated from the obtained spectral data set. From the calculated 2D spectrum, a NIR region is selected that includes bands of high positive correlation intensity with mid-IR bands assigned to the analyte, and used for the construction of a regression model. To evaluate the performance of this procedure, a partial least-squares (PLS) regression model of the ethanol concentration in a fermentation process was constructed. During fermentation, NIR/mid-IR spectra in the 10000 - 1200 cm(-1) region were acquired every 3 min, and a 2D NIR/mid-IR heterospectral correlation spectrum was calculated to investigate the correlation intensity between the NIR and mid-IR bands. NIR regions that include bands at 4343, 4416, 5778, 5904, and 5955 cm(-1), which result from the combinations and overtones of the C-H group of ethanol, were selected for use in the PLS regression models, by taking the correlation intensity of a mid-IR band at 2985 cm(-1) arising from the CH(3) asymmetric stretching vibration mode of ethanol as a reference. The predicted results indicate that the ethanol concentrations calculated from the PLS regression models fit well to those obtained by high-performance liquid chromatography. Thus, it can be concluded that the selection procedure using the NIR/mid-IR dual-region spectrometer combined with 2D NIR/mid-IR heterospectral correlation spectroscopy is a powerful method for the construction of a reliable regression model.
Hyperspectral Super-Resolution of Locally Low Rank Images From Complementary Multisource Data.
Veganzones, Miguel A; Simoes, Miguel; Licciardi, Giorgio; Yokoya, Naoto; Bioucas-Dias, Jose M; Chanussot, Jocelyn
2016-01-01
Remote sensing hyperspectral images (HSIs) are quite often low rank, in the sense that the data belong to a low dimensional subspace/manifold. This has been recently exploited for the fusion of low spatial resolution HSI with high spatial resolution multispectral images in order to obtain super-resolution HSI. Most approaches adopt an unmixing or a matrix factorization perspective. The derived methods have led to state-of-the-art results when the spectral information lies in a low-dimensional subspace/manifold. However, if the subspace/manifold dimensionality spanned by the complete data set is large, i.e., larger than the number of multispectral bands, the performance of these methods mainly decreases because the underlying sparse regression problem is severely ill-posed. In this paper, we propose a local approach to cope with this difficulty. Fundamentally, we exploit the fact that real world HSIs are locally low rank, that is, pixels acquired from a given spatial neighborhood span a very low-dimensional subspace/manifold, i.e., lower or equal than the number of multispectral bands. Thus, we propose to partition the image into patches and solve the data fusion problem independently for each patch. This way, in each patch the subspace/manifold dimensionality is low enough, such that the problem is not ill-posed anymore. We propose two alternative approaches to define the hyperspectral super-resolution through local dictionary learning using endmember induction algorithms. We also explore two alternatives to define the local regions, using sliding windows and binary partition trees. The effectiveness of the proposed approaches is illustrated with synthetic and semi real data.
Parsimony and goodness-of-fit in multi-dimensional NMR inversion
NASA Astrophysics Data System (ADS)
Babak, Petro; Kryuchkov, Sergey; Kantzas, Apostolos
2017-01-01
Multi-dimensional nuclear magnetic resonance (NMR) experiments are often used for study of molecular structure and dynamics of matter in core analysis and reservoir evaluation. Industrial applications of multi-dimensional NMR involve a high-dimensional measurement dataset with complicated correlation structure and require rapid and stable inversion algorithms from the time domain to the relaxation rate and/or diffusion domains. In practice, applying existing inverse algorithms with a large number of parameter values leads to an infinite number of solutions with a reasonable fit to the NMR data. The interpretation of such variability of multiple solutions and selection of the most appropriate solution could be a very complex problem. In most cases the characteristics of materials have sparse signatures, and investigators would like to distinguish the most significant relaxation and diffusion values of the materials. To produce an easy to interpret and unique NMR distribution with the finite number of the principal parameter values, we introduce a new method for NMR inversion. The method is constructed based on the trade-off between the conventional goodness-of-fit approach to multivariate data and the principle of parsimony guaranteeing inversion with the least number of parameter values. We suggest performing the inversion of NMR data using the forward stepwise regression selection algorithm. To account for the trade-off between goodness-of-fit and parsimony, the objective function is selected based on Akaike Information Criterion (AIC). The performance of the developed multi-dimensional NMR inversion method and its comparison with conventional methods are illustrated using real data for samples with bitumen, water and clay.
Barton, Alan J; Valdés, Julio J; Orchard, Robert
2009-01-01
Classical neural networks are composed of neurons whose nature is determined by a certain function (the neuron model), usually pre-specified. In this paper, a type of neural network (NN-GP) is presented in which: (i) each neuron may have its own neuron model in the form of a general function, (ii) any layout (i.e network interconnection) is possible, and (iii) no bias nodes or weights are associated to the connections, neurons or layers. The general functions associated to a neuron are learned by searching a function space. They are not provided a priori, but are rather built as part of an Evolutionary Computation process based on Genetic Programming. The resulting network solutions are evaluated based on a fitness measure, which may, for example, be based on classification or regression errors. Two real-world examples are presented to illustrate the promising behaviour on classification problems via construction of a low-dimensional representation of a high-dimensional parameter space associated to the set of all network solutions.
2013-09-01
M.4.1. Two-dimensional domains cropped out of three-dimensional numerically generated realizations; (a) 3D PCE-NAPL realizations generated by UTCHEM...165 Figure R.3.2. The absolute error vs relative error scatter plots of pM and gM from SGS data set- 4 using multi-task manifold...error scatter plots of pM and gM from TP/MC data set using multi- task manifold regression
Kinoshita, Hidefumi; Nakagawa, Ken; Usui, Yukio; Iwamura, Masatsugu; Ito, Akihiro; Miyajima, Akira; Hoshi, Akio; Arai, Yoichi; Baba, Shiro; Matsuda, Tadashi
2015-08-01
Three-dimensional (3D) imaging systems have been introduced worldwide for surgical instrumentation. A difficulty of laparoscopic surgery involves converting two-dimensional (2D) images into 3D images and depth perception rearrangement. 3D imaging may remove the need for depth perception rearrangement and therefore have clinical benefits. We conducted a multicenter, open-label, randomized trial to compare the surgical outcome of 3D-high-definition (HD) resolution and 2D-HD imaging in laparoscopic radical prostatectomy (LRP), in order to determine whether an LRP under HD resolution 3D imaging is superior to that under HD resolution 2D imaging in perioperative outcome, feasibility, and fatigue. One-hundred twenty-two patients were randomly assigned to a 2D or 3D group. The primary outcome was time to perform vesicourethral anastomosis (VUA), which is technically demanding and may include a number of technical difficulties considered in laparoscopic surgeries. VUA time was not significantly shorter in the 3D group (26.7 min, mean) compared with the 2D group (30.1 min, mean) (p = 0.11, Student's t test). However, experienced surgeons and 3D-HD imaging were independent predictors for shorter VUA times (p = 0.000, p = 0.014, multivariate logistic regression analysis). Total pneumoperitoneum time was not different. No conversion case from 3D to 2D or LRP to open RP was observed. Fatigue was evaluated by a simulation sickness questionnaire and critical flicker frequency. Results were not different between the two groups. Subjective feasibility and satisfaction scores were significantly higher in the 3D group. Using a 3D imaging system in LRP may have only limited advantages in decreasing operation times over 2D imaging systems. However, the 3D system increased surgical feasibility and decreased surgeons' effort levels without inducing significant fatigue.
Tian, Xinyu; Wang, Xuefeng; Chen, Jun
2014-01-01
Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.
NASA Astrophysics Data System (ADS)
Hut, Rolf; Amisigo, Barnabas A.; Steele-Dunne, Susan; van de Giesen, Nick
2015-12-01
Reduction of Used Memory Ensemble Kalman Filtering (RumEnKF) is introduced as a variant on the Ensemble Kalman Filter (EnKF). RumEnKF differs from EnKF in that it does not store the entire ensemble, but rather only saves the first two moments of the ensemble distribution. In this way, the number of ensemble members that can be calculated is less dependent on available memory, and mainly on available computing power (CPU). RumEnKF is developed to make optimal use of current generation super computer architecture, where the number of available floating point operations (flops) increases more rapidly than the available memory and where inter-node communication can quickly become a bottleneck. RumEnKF reduces the used memory compared to the EnKF when the number of ensemble members is greater than half the number of state variables. In this paper, three simple models are used (auto-regressive, low dimensional Lorenz and high dimensional Lorenz) to show that RumEnKF performs similarly to the EnKF. Furthermore, it is also shown that increasing the ensemble size has a similar impact on the estimation error from the three algorithms.
PRIM versus CART in subgroup discovery: when patience is harmful.
Abu-Hanna, Ameen; Nannings, Barry; Dongelmans, Dave; Hasman, Arie
2010-10-01
We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered "peeling of" a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions.
NASA Astrophysics Data System (ADS)
Lespinats, S.; Meyer-Bäse, Anke; He, Huan; Marshall, Alan G.; Conrad, Charles A.; Emmett, Mark R.
2009-05-01
Partial Least Square Regression (PLSR) and Data-Driven High Dimensional Scaling (DD-HDS) are employed for the prediction and the visualization of changes in polar lipid expression induced by different combinations of wild-type (wt) p53 gene therapy and SN38 chemotherapy of U87 MG glioblastoma cells. A very detailed analysis of the gangliosides reveals that certain gangliosides of GM3 or GD1-type have unique properties not shared by the others. In summary, this preliminary work shows that data mining techniques are able to determine the modulation of gangliosides by different treatment combinations.
Predicting arsenic in drinking water wells of the Central Valley, California
Ayotte, Joseph; Nolan, Bernard T.; Gronberg, JoAnn M.
2016-01-01
Probabilities of arsenic in groundwater at depths used for domestic and public supply in the Central Valley of California are predicted using weak-learner ensemble models (boosted regression trees, BRT) and more traditional linear models (logistic regression, LR). Both methods captured major processes that affect arsenic concentrations, such as the chemical evolution of groundwater, redox differences, and the influence of aquifer geochemistry. Inferred flow-path length was the most important variable but near-surface-aquifer geochemical data also were significant. A unique feature of this study was that previously predicted nitrate concentrations in three dimensions were themselves predictive of arsenic and indicated an important redox effect at >10 μg/L, indicating low arsenic where nitrate was high. Additionally, a variable representing three-dimensional aquifer texture from the Central Valley Hydrologic Model was an important predictor, indicating high arsenic associated with fine-grained aquifer sediment. BRT outperformed LR at the 5 μg/L threshold in all five predictive performance measures and at 10 μg/L in four out of five measures. BRT yielded higher prediction sensitivity (39%) than LR (18%) at the 10 μg/L threshold–a useful outcome because a major objective of the modeling was to improve our ability to predict high arsenic areas.
Generative Topographic Mapping of Conformational Space.
Horvath, Dragos; Baskin, Igor; Marcou, Gilles; Varnek, Alexandre
2017-10-01
Herein, Generative Topographic Mapping (GTM) was challenged to produce planar projections of the high-dimensional conformational space of complex molecules (the 1LE1 peptide). GTM is a probability-based mapping strategy, and its capacity to support property prediction models serves to objectively assess map quality (in terms of regression statistics). The properties to predict were total, non-bonded and contact energies, surface area and fingerprint darkness. Map building and selection was controlled by a previously introduced evolutionary strategy allowed to choose the best-suited conformational descriptors, options including classical terms and novel atom-centric autocorrellograms. The latter condensate interatomic distance patterns into descriptors of rather low dimensionality, yet precise enough to differentiate between close favorable contacts and atom clashes. A subset of 20 K conformers of the 1LE1 peptide, randomly selected from a pool of 2 M geometries (generated by the S4MPLE tool) was employed for map building and cross-validation of property regression models. The GTM build-up challenge reached robust three-fold cross-validated determination coefficients of Q 2 =0.7…0.8, for all modeled properties. Mapping of the full 2 M conformer set produced intuitive and information-rich property landscapes. Functional and folding subspaces appear as well-separated zones, even though RMSD with respect to the PDB structure was never used as a selection criterion of the maps. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
How to predict the sugariness and hardness of melons: A near-infrared hyperspectral imaging method.
Sun, Meijun; Zhang, Dong; Liu, Li; Wang, Zheng
2017-03-01
Hyperspectral imaging (HSI) in the near-infrared (NIR) region (900-1700nm) was used for non-intrusive quality measurements (of sweetness and texture) in melons. First, HSI data from melon samples were acquired to extract the spectral signatures. The corresponding sample sweetness and hardness values were recorded using traditional intrusive methods. Partial least squares regression (PLSR), principal component analysis (PCA), support vector machine (SVM), and artificial neural network (ANN) models were created to predict melon sweetness and hardness values from the hyperspectral data. Experimental results for the three types of melons show that PLSR produces the most accurate results. To reduce the high dimensionality of the hyperspectral data, the weighted regression coefficients of the resulting PLSR models were used to identify the most important wavelengths. On the basis of these wavelengths, each image pixel was used to visualize the sweetness and hardness in all the portions of each sample. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kim, Sang-Rok; Lee, Kyung-Min; Cho, Jin-Hyoung; Hwang, Hyeon-Shik
2016-04-01
An anatomical relationship between the hard and soft tissues of the face is mandatory for facial reconstruction. The purpose of this study was to investigate the positions of the eyeball and canthi three-dimensionally from the relationships between the facial hard and soft tissues using cone-beam computed tomography (CBCT). CBCT scan data of 100 living subjects were used to obtain the measurements of facial hard and soft tissues. Stepwise multiple regression analyses were carried out using the hard tissue measurements in the orbit, nasal bone, nasal cavity and maxillary canine to predict the most probable positions of the eyeball and canthi within the orbit. Orbital width, orbital height, and orbital depth were strong predictors of the eyeball and canthi position. Intercanine width was also a predictor of the mediolateral position of the eyeball. Statistically significant regression models for the positions of the eyeball and canthi could be derived from the measurements of orbit and maxillary canine. These results suggest that CBCT data can be useful in predicting the positions of the eyeball and canthi three-dimensionally. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Guan, Yafu; Yang, Shuo; Zhang, Dong H.
2018-04-01
Gaussian process regression (GPR) is an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyatomic molecules. Since not only the posterior mean but also the posterior variance can be easily calculated, GPR provides a well-established model for active learning, through which PESs can be constructed more efficiently and accurately. We propose a strategy of active data selection for the construction of PESs with emphasis on low energy regions. Through three-dimensional (3D) example of H3, the validity of this strategy is verified. The PESs for two prototypically reactive systems, namely, H + H2O ↔ H2 + OH reaction and H + CH4 ↔ H2 + CH3 reaction are reconstructed. Only 920 and 4000 points are assembled to reconstruct these two PESs respectively. The accuracy of the GP PESs is not only tested by energy errors but also validated by quantum scattering calculations.
Analytical three-point Dixon method: With applications for spiral water-fat imaging.
Wang, Dinghui; Zwart, Nicholas R; Li, Zhiqiang; Schär, Michael; Pipe, James G
2016-02-01
The goal of this work is to present a new three-point analytical approach with flexible even or uneven echo increments for water-fat separation and to evaluate its feasibility with spiral imaging. Two sets of possible solutions of water and fat are first found analytically. Then, two field maps of the B0 inhomogeneity are obtained by linear regression. The initial identification of the true solution is facilitated by the root-mean-square error of the linear regression and the incorporation of a fat spectrum model. The resolved field map after a region-growing algorithm is refined iteratively for spiral imaging. The final water and fat images are recalculated using a joint water-fat separation and deblurring algorithm. Successful implementations were demonstrated with three-dimensional gradient-echo head imaging and single breathhold abdominal imaging. Spiral, high-resolution T1 -weighted brain images were shown with comparable sharpness to the reference Cartesian images. With appropriate choices of uneven echo increments, it is feasible to resolve the aliasing of the field map voxel-wise. High-quality water-fat spiral imaging can be achieved with the proposed approach. © 2015 Wiley Periodicals, Inc.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
NASA Astrophysics Data System (ADS)
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for maximal response. For the calculation of the regression coefficients, dispersion and correlation coefficients, the software Matlab was used.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.
Hero, Alfred O; Rajaratnam, Bala
2016-01-01
When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Experimental witness of genuine high-dimensional entanglement
NASA Astrophysics Data System (ADS)
Guo, Yu; Hu, Xiao-Min; Liu, Bi-Heng; Huang, Yun-Feng; Li, Chuan-Feng; Guo, Guang-Can
2018-06-01
Growing interest has been invested in exploring high-dimensional quantum systems, for their promising perspectives in certain quantum tasks. How to characterize a high-dimensional entanglement structure is one of the basic questions to take full advantage of it. However, it is not easy for us to catch the key feature of high-dimensional entanglement, for the correlations derived from high-dimensional entangled states can be possibly simulated with copies of lower-dimensional systems. Here, we follow the work of Kraft et al. [Phys. Rev. Lett. 120, 060502 (2018), 10.1103/PhysRevLett.120.060502], and present the experimental realizing of creation and detection, by the normalized witness operation, of the notion of genuine high-dimensional entanglement, which cannot be decomposed into lower-dimensional Hilbert space and thus form the entanglement structures existing in high-dimensional systems only. Our experiment leads to further exploration of high-dimensional quantum systems.
Spatial regression analysis on 32 years of total column ozone data
NASA Astrophysics Data System (ADS)
Knibbe, J. S.; van der A, R. J.; de Laat, A. T. J.
2014-08-01
Multiple-regression analyses have been performed on 32 years of total ozone column data that was spatially gridded with a 1 × 1.5° resolution. The total ozone data consist of the MSR (Multi Sensor Reanalysis; 1979-2008) and 2 years of assimilated SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY) ozone data (2009-2010). The two-dimensionality in this data set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on nonseasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO), El Niño-Southern Oscillation (ENSO) and stratospheric alternative halogens which are parameterized by the effective equivalent stratospheric chlorine (EESC). For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of a similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at mid- and high latitudes, the solar cycle affects ozone positively mostly in the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high northern latitudes, the effect of QBO is positive and negative in the tropics and mid- to high latitudes, respectively, and ENSO affects ozone negatively between 30° N and 30° S, particularly over the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally large at mid- to high latitudes. We observe ozone increases with potential vorticity and day length and ozone decreases with geopotential height and variable ozone effects due to the polar vortex in regions to the north and south of the polar vortices. Recovery of ozone is identified globally. However, recovery rates and uncertainties strongly depend on choices that can be made in defining the explanatory variables. The application of several trend models, each with their own pros and cons, yields a large range of recovery rate estimates. Overall these results suggest that care has to be taken in determining ozone recovery rates, in particular for the Antarctic ozone hole.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
NASA Astrophysics Data System (ADS)
Gong, Lunkun; Chen, Xiong; Musa, Omer; Yang, Haitao; Zhou, Changsheng
2017-12-01
Numerical and experimental investigation on the solid-fuel ramjet was carried out to study the effect of geometry on combustion characteristics. The two-dimensional axisymmetric program developed in the present study adopted finite rate chemistry and second-order moment turbulence-chemistry models, together with k-ω shear stress transport (SST) turbulence model. Experimental data were obtained by burning cylindrical polyethylene using a connected pipe facility. The simulation results show that a fuel-rich zone near the solid fuel surface and an air-rich zone in the core exist in the chamber, and the chemical reactions occur mainly in the interface of this two regions; The physical reasons for the effect of geometry on regression rate is the variation of turbulent viscosity due to the geometry change. Port-to-inlet diameter ratio is the main parameter influencing the turbulent viscosity, and a linear relationship between port-to-inlet diameter and regression rate were obtained. The air mass flow rate and air-fuel ratio are the main influencing factors on ramjet performances. Based on the simulation results, the correlations between geometry and air-fuel ratio were obtained, and the effect of geometry on ramjet performances was analyzed according to the correlation. Three-dimensional regression rate contour obtained experimentally indicates that the regression rate which shows axisymmetric distribution due to the symmetry structure increases sharply, followed by slow decrease in axial direction. The radiation heat transfer in recirculation zone cannot be ignored. Compared with the experimental results, the deviations of calculated average regression rate and characteristic velocity are about 5%. Concerning the effect of geometry on air-fuel ratio, the deviations between experimental and theoretical results are less than 10%.
NASA Astrophysics Data System (ADS)
Metzger, Philip T.; Lane, John E.; Carilli, Robert A.; Long, Jason M.; Shawn, Kathy L.
2010-07-01
A method combining photogrammetry with ballistic analysis is demonstrated to identify flying debris in a rocket launch environment. Debris traveling near the STS-124 Space Shuttle was captured on cameras viewing the launch pad within the first few seconds after launch. One particular piece of debris caught the attention of investigators studying the release of flame trench fire bricks because its high trajectory could indicate a flight risk to the Space Shuttle. Digitized images from two pad perimeter high-speed 16-mm film cameras were processed using photogrammetry software based on a multi-parameter optimization technique. Reference points in the image were found from 3D CAD models of the launch pad and from surveyed points on the pad. The three-dimensional reference points were matched to the equivalent two-dimensional camera projections by optimizing the camera model parameters using a gradient search optimization technique. Using this method of solving the triangulation problem, the xyz position of the object's path relative to the reference point coordinate system was found for every set of synchronized images. This trajectory was then compared to a predicted trajectory while performing regression analysis on the ballistic coefficient and other parameters. This identified, with a high degree of confidence, the object's material density and thus its probable origin within the launch pad environment. Future extensions of this methodology may make it possible to diagnose the underlying causes of debris-releasing events in near-real time, thus improving flight safety.
Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Glaser, Christian; Wismüller, Axel
2014-02-01
Phase-contrast computed tomography (PCI-CT) has shown tremendous potential as an imaging modality for visualizing human cartilage with high spatial resolution. Previous studies have demonstrated the ability of PCI-CT to visualize (1) structural details of the human patellar cartilage matrix and (2) changes to chondrocyte organization induced by osteoarthritis. This study investigates the use of high-dimensional geometric features in characterizing such chondrocyte patterns in the presence or absence of osteoarthritic damage. Geometrical features derived from the scaling index method (SIM) and statistical features derived from gray-level co-occurrence matrices were extracted from 842 regions of interest (ROI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. These features were subsequently used in a machine learning task with support vector regression to classify ROIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic curve (AUC). SIM-derived geometrical features exhibited the best classification performance (AUC, 0.95 ± 0.06) and were most robust to changes in ROI size. These results suggest that such geometrical features can provide a detailed characterization of the chondrocyte organization in the cartilage matrix in an automated and non-subjective manner, while also enabling classification of cartilage as healthy or osteoarthritic with high accuracy. Such features could potentially serve as imaging markers for evaluating osteoarthritis progression and its response to different therapeutic intervention strategies.
Variable screening via quantile partial correlation
Ma, Shujie; Tsai, Chih-Ling
2016-01-01
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683
On-line analysis of algae in water by discrete three-dimensional fluorescence spectroscopy.
Zhao, Nanjing; Zhang, Xiaoling; Yin, Gaofang; Yang, Ruifang; Hu, Li; Chen, Shuang; Liu, Jianguo; Liu, Wenqing
2018-03-19
In view of the problem of the on-line measurement of algae classification, a method of algae classification and concentration determination based on the discrete three-dimensional fluorescence spectra was studied in this work. The discrete three-dimensional fluorescence spectra of twelve common species of algae belonging to five categories were analyzed, the discrete three-dimensional standard spectra of five categories were built, and the recognition, classification and concentration prediction of algae categories were realized by the discrete three-dimensional fluorescence spectra coupled with non-negative weighted least squares linear regression analysis. The results show that similarities between discrete three-dimensional standard spectra of different categories were reduced and the accuracies of recognition, classification and concentration prediction of the algae categories were significantly improved. By comparing with that of the chlorophyll a fluorescence excitation spectra method, the recognition accuracy rate in pure samples by discrete three-dimensional fluorescence spectra is improved 1.38%, and the recovery rate and classification accuracy in pure diatom samples 34.1% and 46.8%, respectively; the recognition accuracy rate of mixed samples by discrete-three dimensional fluorescence spectra is enhanced by 26.1%, the recovery rate of mixed samples with Chlorophyta 37.8%, and the classification accuracy of mixed samples with diatoms 54.6%.
Euclidean chemical spaces from molecular fingerprints: Hamming distance and Hempel's ravens.
Martin, Eric; Cao, Eddie
2015-05-01
Molecules are often characterized by sparse binary fingerprints, where 1s represent the presence of substructures and 0s represent their absence. Fingerprints are especially useful for similarity calculations, such as database searching or clustering, generally measuring similarity as the Tanimoto coefficient. In other cases, such as visualization, design of experiments, or latent variable regression, a low-dimensional Euclidian "chemical space" is more useful, where proximity between points reflects chemical similarity. A temptation is to apply principal components analysis (PCA) directly to these fingerprints to obtain a low dimensional continuous chemical space. However, Gower has shown that distances from PCA on bit vectors are proportional to the square root of Hamming distance. Unlike Tanimoto similarity, Hamming similarity (HS) gives equal weight to shared 0s as to shared 1s, that is, HS gives as much weight to substructures that neither molecule contains, as to substructures which both molecules contain. Illustrative examples show that proximity in the corresponding chemical space reflects mainly similar size and complexity rather than shared chemical substructures. These spaces are ill-suited for visualizing and optimizing coverage of chemical space, or as latent variables for regression. A more suitable alternative is shown to be Multi-dimensional scaling on the Tanimoto distance matrix, which produces a space where proximity does reflect structural similarity.
Hwang, Jae Joon; Kim, Kee-Deog; Park, Hyok; Park, Chang Seo; Jeong, Ho-Gul
2014-01-01
Superimposition has been used as a method to evaluate the changes of orthodontic or orthopedic treatment in the dental field. With the introduction of cone beam CT (CBCT), evaluating 3 dimensional changes after treatment became possible by superimposition. 4 point plane orientation is one of the simplest ways to achieve superimposition of 3 dimensional images. To find factors influencing superimposition error of cephalometric landmarks by 4 point plane orientation method and to evaluate the reproducibility of cephalometric landmarks for analyzing superimposition error, 20 patients were analyzed who had normal skeletal and occlusal relationship and took CBCT for diagnosis of temporomandibular disorder. The nasion, sella turcica, basion and midpoint between the left and the right most posterior point of the lesser wing of sphenoidal bone were used to define a three-dimensional (3D) anatomical reference co-ordinate system. Another 15 reference cephalometric points were also determined three times in the same image. Reorientation error of each landmark could be explained substantially (23%) by linear regression model, which consists of 3 factors describing position of each landmark towards reference axes and locating error. 4 point plane orientation system may produce an amount of reorientation error that may vary according to the perpendicular distance between the landmark and the x-axis; the reorientation error also increases as the locating error and shift of reference axes viewed from each landmark increases. Therefore, in order to reduce the reorientation error, accuracy of all landmarks including the reference points is important. Construction of the regression model using reference points of greater precision is required for the clinical application of this model.
NASA Astrophysics Data System (ADS)
Wang, Wei; Yang, Jiong
With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.
Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data
NASA Astrophysics Data System (ADS)
Pathak, Jaideep; Lu, Zhixin; Hunt, Brian R.; Girvan, Michelle; Ott, Edward
2017-12-01
We use recent advances in the machine learning area known as "reservoir computing" to formulate a method for model-free estimation from data of the Lyapunov exponents of a chaotic process. The technique uses a limited time series of measurements as input to a high-dimensional dynamical system called a "reservoir." After the reservoir's response to the data is recorded, linear regression is used to learn a large set of parameters, called the "output weights." The learned output weights are then used to form a modified autonomous reservoir designed to be capable of producing an arbitrarily long time series whose ergodic properties approximate those of the input signal. When successful, we say that the autonomous reservoir reproduces the attractor's "climate." Since the reservoir equations and output weights are known, we can compute the derivatives needed to determine the Lyapunov exponents of the autonomous reservoir, which we then use as estimates of the Lyapunov exponents for the original input generating system. We illustrate the effectiveness of our technique with two examples, the Lorenz system and the Kuramoto-Sivashinsky (KS) equation. In the case of the KS equation, we note that the high dimensional nature of the system and the large number of Lyapunov exponents yield a challenging test of our method, which we find the method successfully passes.
Huang, Yen-Tsung; Pan, Wen-Chi
2016-06-01
Causal mediation modeling has become a popular approach for studying the effect of an exposure on an outcome through a mediator. However, current methods are not applicable to the setting with a large number of mediators. We propose a testing procedure for mediation effects of high-dimensional continuous mediators. We characterize the marginal mediation effect, the multivariate component-wise mediation effects, and the L2 norm of the component-wise effects, and develop a Monte-Carlo procedure for evaluating their statistical significance. To accommodate the setting with a large number of mediators and a small sample size, we further propose a transformation model using the spectral decomposition. Under the transformation model, mediation effects can be estimated using a series of regression models with a univariate transformed mediator, and examined by our proposed testing procedure. Extensive simulation studies are conducted to assess the performance of our methods for continuous and dichotomous outcomes. We apply the methods to analyze genomic data investigating the effect of microRNA miR-223 on a dichotomous survival status of patients with glioblastoma multiforme (GBM). We identify nine gene ontology sets with expression values that significantly mediate the effect of miR-223 on GBM survival. © 2015, The International Biometric Society.
Quadratic Polynomial Regression using Serial Observation Processing:Implementation within DART
NASA Astrophysics Data System (ADS)
Hodyss, D.; Anderson, J. L.; Collins, N.; Campbell, W. F.; Reinecke, P. A.
2017-12-01
Many Ensemble-Based Kalman ltering (EBKF) algorithms process the observations serially. Serial observation processing views the data assimilation process as an iterative sequence of scalar update equations. What is useful about this data assimilation algorithm is that it has very low memory requirements and does not need complex methods to perform the typical high-dimensional inverse calculation of many other algorithms. Recently, the push has been towards the prediction, and therefore the assimilation of observations, for regions and phenomena for which high-resolution is required and/or highly nonlinear physical processes are operating. For these situations, a basic hypothesis is that the use of the EBKF is sub-optimal and performance gains could be achieved by accounting for aspects of the non-Gaussianty. To this end, we develop here a new component of the Data Assimilation Research Testbed [DART] to allow for a wide-variety of users to test this hypothesis. This new version of DART allows one to run several variants of the EBKF as well as several variants of the quadratic polynomial lter using the same forecast model and observations. Dierences between the results of the two systems will then highlight the degree of non-Gaussianity in the system being examined. We will illustrate in this work the differences between the performance of linear versus quadratic polynomial regression in a hierarchy of models from Lorenz-63 to a simple general circulation model.
Prediction of epigenetically regulated genes in breast cancer cell lines.
Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria E H; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram
2010-06-04
Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profiles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profiles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fixed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically significant negative correlation between methylation profiles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identified 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
Three-dimensional numerical and experimental studies on transient ignition of hybrid rocket motor
NASA Astrophysics Data System (ADS)
Tian, Hui; Yu, Ruipeng; Zhu, Hao; Wu, Junfeng; Cai, Guobiao
2017-11-01
This paper presents transient simulations and experimental studies of the ignition process of the hybrid rocket motors (HRMs) using 90% hydrogen peroxide (HP) as the oxidizer and polymethyl methacrylate (PMMA) and Polyethylene (PE) as fuels. A fluid-solid coupling numerically method is established based on the conserved form of the three-dimensional unsteady Navier-Stokes (N-S) equations, considering gas fluid with chemical reactions and heat transfer between the fluid and solid region. Experiments are subsequently conducted using high-speed camera to record the ignition process. The flame propagation, chamber pressurizing process and average fuel regression rate of the numerical simulation results show good agreement with the experimental ones, which demonstrates the validity of the simulations in this study. The results also indicate that the flame propagation time is mainly affected by fluid dynamics and it increases with an increasing grain port area. The chamber pressurizing process begins when the flame propagation completes in the grain port. Furthermore, the chamber pressurizing time is about 4 times longer than the time of flame propagation.
Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis
Cui, Hengjian; Li, Runze
2014-01-01
This work is concerned with marginal sure independence feature screening for ultra-high dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use conditional distribution function to construct a new index for feature screening. In this paper, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed procedure without assuming any moment condition on the predictors. The proposed procedure enjoys several appealing merits. First, it is model-free in that its implementation does not require specification of a regression model. Second, it is robust to heavy-tailed distributions of predictors and the presence of potential outliers. Third, it allows the categorical response having a diverging number of classes in the order of O(nκ) with some κ ≥ 0. We assess the finite sample property of the proposed procedure by Monte Carlo simulation studies and numerical comparison. We further illustrate the proposed methodology by empirical analyses of two real-life data sets. PMID:26392643
Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li
2013-01-21
A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.
Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I
2012-07-27
A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.
Wartberg, L; Kriston, L; Kramer, M; Schwedler, A; Lincoln, T M; Kammerl, R
2017-06-01
Internet gaming disorder (IGD) has been included in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Currently, associations between IGD in early adolescence and mental health are largely unexplained. In the present study, the relation of IGD with adolescent and parental mental health was investigated for the first time. We surveyed 1095 family dyads (an adolescent aged 12-14 years and a related parent) with a standardized questionnaire for IGD as well as for adolescent and parental mental health. We conducted linear (dimensional approach) and logistic (categorical approach) regression analyses. Both with dimensional and categorical approaches, we observed statistically significant associations between IGD and male gender, a higher degree of adolescent antisocial behavior, anger control problems, emotional distress, self-esteem problems, hyperactivity/inattention and parental anxiety (linear regression model: corrected R 2 =0.41, logistic regression model: Nagelkerke's R 2 =0.41). IGD appears to be associated with internalizing and externalizing problems in adolescents. Moreover, the findings of the present study provide first evidence that not only adolescent but also parental mental health is relevant to IGD in early adolescence. Adolescent and parental mental health should be considered in prevention and intervention programs for IGD in adolescence. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Liu, Chang-Fu; He, Xing-Yuan; Chen, Wei; Zhao, Gui-Ling; Xue, Wen-Duo
2008-06-01
Based on the fractal theory of forest growth, stepwise regression was employed to pursue a convenient and efficient method of measuring the three-dimensional green biomass (TGB) of urban forests in small area. A total of thirteen simulation equations of TGB of urban forests in Shenyang City were derived, with the factors affecting the TGB analyzed. The results showed that the coefficients of determination (R2) of the 13 simulation equations ranged from 0.612 to 0.842. No evident pattern was shown in residual analysis, and the precisions were all higher than 87% (alpha = 0.05) and 83% (alpha = 0.01). The most convenient simulation equation was ln Y = 7.468 + 0.926 lnx1, where Y was the simulated TGB and x1 was basal area at breast height per hectare (SDB). The correlations between the standard regression coefficients of the simulation equations and 16 tree characteristics suggested that SDB was the main factor affecting the TGB of urban forests in Shenyang.
Duvall, Susanne W.; Erickson, Sarah J.; MacLean, Peggy; Lowe, Jean R.
2014-01-01
The goal was to identify perinatal predictors of early executive dysfunction in preschoolers born very low birth weight. Fifty-seven preschoolers completed three executive function tasks (Dimensional Change Card Sort-Separated (inhibition, working memory and cognitive flexibility), Bear Dragon (inhibition and working memory) and Gift Delay Open (inhibition)). Relationships between executive function and perinatal medical severity factors (gestational age, days on ventilation, size for gestational age, maternal steroids and number of surgeries), and chronological age were investigated by multiple linear regression and logistic regression. Different perinatal medical severity factors were predictive of executive function tasks, with gestational age predicting Bear Dragon and Gift Open; and number of surgeries and maternal steroids predicting performance on Dimensional Change Card Sort-Separated. By understanding the relationship between perinatal medical severity factors and preschool executive outcomes, we may be able to identify children at highest risk for future executive dysfunction, thereby focusing targeted early intervention services. PMID:25117418
Analysis and generation of groundwater concentration time series
NASA Astrophysics Data System (ADS)
Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae
2018-01-01
Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.
Numerical investigations of hybrid rocket engines
NASA Astrophysics Data System (ADS)
Betelin, V. B.; Kushnirenko, A. G.; Smirnov, N. N.; Nikitin, V. F.; Tyurenkova, V. V.; Stamov, L. I.
2018-03-01
Paper presents the results of numerical studies of hybrid rocket engines operating cycle including unsteady-state transition stage. A mathematical model is developed accounting for the peculiarities of diffusion combustion of fuel in the flow of oxidant, which is composed of oxygen-nitrogen mixture. Three dimensional unsteady-state simulations of chemically reacting gas mixture above thermochemically destructing surface are performed. The results show that the diffusion combustion brings to strongly non-uniform fuel mass regression rate in the flow direction. Diffusive deceleration of chemical reaction brings to the decrease of fuel regression rate in the longitudinal direction.
Peleato, Nicolas M; Legge, Raymond L; Andrews, Robert C
2018-06-01
The use of fluorescence data coupled with neural networks for improved predictability of drinking water disinfection by-products (DBPs) was investigated. Novel application of autoencoders to process high-dimensional fluorescence data was related to common dimensionality reduction techniques of parallel factors analysis (PARAFAC) and principal component analysis (PCA). The proposed method was assessed based on component interpretability as well as for prediction of organic matter reactivity to formation of DBPs. Optimal prediction accuracies on a validation dataset were observed with an autoencoder-neural network approach or by utilizing the full spectrum without pre-processing. Latent representation by an autoencoder appeared to mitigate overfitting when compared to other methods. Although DBP prediction error was minimized by other pre-processing techniques, PARAFAC yielded interpretable components which resemble fluorescence expected from individual organic fluorophores. Through analysis of the network weights, fluorescence regions associated with DBP formation can be identified, representing a potential method to distinguish reactivity between fluorophore groupings. However, distinct results due to the applied dimensionality reduction approaches were observed, dictating a need for considering the role of data pre-processing in the interpretability of the results. In comparison to common organic measures currently used for DBP formation prediction, fluorescence was shown to improve prediction accuracies, with improvements to DBP prediction best realized when appropriate pre-processing and regression techniques were applied. The results of this study show promise for the potential application of neural networks to best utilize fluorescence EEM data for prediction of organic matter reactivity. Copyright © 2018 Elsevier Ltd. All rights reserved.
Pinto, Paula Sanders Pereira; Iego, Sandro; Nunes, Samantha; Menezes, Hemanny; Mastrorosa, Rosana Sávio; Oliveira, Irismar Reis de; Rosário, Maria Conceição do
2011-03-01
This study investigates obsessive-compulsive disorder patients in terms of strategic planning and its association with specific obsessive-compulsive symptom dimensions. We evaluated 32 obsessive-compulsive disorder patients. Strategic planning was assessed by the Rey-Osterrieth Complex Figure Test, and the obsessive-compulsive dimensions were assessed by the Dimensional Yale-Brown Obsessive-Compulsive Scale. In the statistical analyses, the level of significance was set at 5%. We employed linear regression, including age, intelligence quotient, number of comorbidities, the Yale-Brown Obsessive-Compulsive Scale score, and the Dimensional Yale-Brown Obsessive-Compulsive Scale. The Dimensional Yale-Brown Obsessive-Compulsive Scale "worst-ever" score correlated significantly with the planning score on the copy portion of the Rey-Osterrieth Complex Figure Test (r = 0.4, p = 0.04) and was the only variable to show a significant association after linear regression (β = 0.55, t = 2.1, p = 0.04). Compulsive hoarding correlated positively with strategic planning (r = 0.44, p = 0.03). None of the remaining symptom dimensions presented any significant correlations with strategic planning. We found the severity of obsessive-compulsive symptoms to be associated with strategic planning. In addition, there was a significant positive association between the planning score on the copy portion of the Rey-Osterrieth Complex Figure Test copy score and the hoarding dimension score on the Dimensional Yale-Brown Obsessive-Compulsive Scale. Our results underscore the idea that obsessive-compulsive disorder is a heterogeneous disorder and suggest that the hoarding dimension has a specific neuropsychological profile. Therefore, it is important to assess the peculiarities of each obsessive-compulsive symptom dimension.
Spatial Bayesian Latent Factor Regression Modeling of Coordinate-based Meta-analysis Data
Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D.; Nichols, Thomas E.
2017-01-01
Summary Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the paper are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to 1) identify areas of consistent activation; and 2) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterised as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. PMID:28498564
Mixed kernel function support vector regression for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
NASA Astrophysics Data System (ADS)
Wang, Xuntao; Feng, Jianhu; Wang, Hu; Hong, Shidi; Zheng, Supei
2018-03-01
A three-dimensional finite element box girder bridge and its asphalt concrete deck pavement were established by ANSYS software, and the interlayer bonding condition of asphalt concrete deck pavement was assumed to be contact bonding condition. Orthogonal experimental design is used to arrange the testing plans of material parameters, and an evaluation of the effect of different material parameters in the mechanical response of asphalt concrete surface layer was conducted by multiple linear regression model and using the results from the finite element analysis. Results indicated that stress regression equations can well predict the stress of the asphalt concrete surface layer, and elastic modulus of waterproof layer has a significant influence on stress values of asphalt concrete surface layer.
Calvo, Natalia; Valero, Sergi; Sáez-Francàs, Naia; Gutiérrez, Fernando; Casas, Miguel; Ferrer, Marc
2016-10-01
Borderline personality disorder (BPD) diagnosis has been considered highly controversial. The Diagnostic and Statistical Manual of Mental Disorders 5th edition (DSM-5) proposes an alternative hybrid diagnostic model for personality disorders (PD), and the Personality Inventory for DSM-5 (PID-5) has adequate psychometric properties and has been widely used for the assessment of the dimensional component. Our aim was to analyze the utility of the personality traits presented in Section III of the DSM-5 for BPD diagnosis in an outpatient clinical sample, using the Spanish version of the PID-5. Two clinical samples were studied: BPD sample (n=84) and non-BPD sample (n=45). Between-sample differences in PID-5 scores were analyzed. The BPD sample obtained significantly higher scores in most PID-5 trait facets and domains. Specifically and after regression logistic analyses, in BPD patients, the domains of Negative Affectivity and Disinhibition, and the trait facets of emotional lability, [lack of] restricted affectivity, and impulsivity were more significantly associated with BPD. Although our findings are only partially consistent with the algorithm proposed by DSM-5, we consider that the combination of the PID-5 trait domains and facets could be useful for BPD dimensional diagnosis, and could further our understanding of BPD diagnosis complexity. Copyright © 2016 Elsevier Inc. All rights reserved.
L2-Boosting algorithm applied to high-dimensional problems in genomic selection.
González-Recio, Oscar; Weigel, Kent A; Gianola, Daniel; Naya, Hugo; Rosa, Guilherme J M
2010-06-01
The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive alternative for genomic selection applications, providing high accuracy and low bias in genomic-assisted evaluations with a relatively short computational time.
Jeong, Jin-Seok; Lee, Seung-Youp; Chang, Moontaek
2016-06-01
The aim of this study was to evaluate alterations of papilla dimensions after orthodontic closure of the diastema between maxillary central incisors. Sixty patients who had a visible diastema between maxillary central incisors that had been closed by orthodontic approximation were selected for this study. Various papilla dimensions were assessed on clinical photographs and study models before the orthodontic treatment and at the follow-up examination after closure of the diastema. Influences of the variables assessed before orthodontic treatment on the alterations of papilla height (PH) and papilla base thickness (PBT) were evaluated by univariate regression analysis. To analyze potential influences of the 3-dimensional papilla dimensions before orthodontic treatment on the alterations of PH and PBT, a multiple regression model was formulated including the 3-dimensional papilla dimensions as predictor variables. On average, PH decreased by 0.80 mm and PBT increased after orthodontic closure of the diastema (P<0.01). Univariate regression analysis revealed that the PH (P=0.002) and PBT (P=0.047) before orthodontic treatment influenced the alteration of PH. With respect to the alteration of PBT, the diastema width (P=0.045) and PBT (P=0.000) were found to be influential factors. PBT before the orthodontic treatment significantly influenced the alteration of PBT in the multiple regression model. PH decreased but PBT increased after orthodontic closure of the diastema. The papilla dimensions before orthodontic treatment influenced the alterations of PH and PBT after closure of the diastema. The PBT increased more when the diastema width before the orthodontic treatment was larger.
Experimental study on water content detection of traditional masonry based on infrared thermal image
NASA Astrophysics Data System (ADS)
Zhang, Baoqing; Lei, Zukang
2017-10-01
Based on infrared thermal imaging technology for seepage test of two kinds of brick masonry, find out the relationship between the distribution of one-dimensional two brick surface temperature distribution and one-dimensional surface moisture content were determined after seepage brick masonry minimum temperature zone and water content determination method of the highest point of the regression equation, the relationship between temperature and moisture content of the brick masonry reflected the quantitative and establish the initial wet masonry building disease analysis method, then the infrared technology is applied to the protection of historic buildings in.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.
Grapov, Dmitry; Newman, John W
2012-09-01
Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).
Transformations based on continuous piecewise-affine velocity fields
Freifeld, Oren; Hauberg, Soren; Batmanghelich, Kayhan; ...
2017-01-11
Here, we propose novel finite-dimensional spaces of well-behaved Rn → Rn transformations. The latter are obtained by (fast and highly-accurate) integration of continuous piecewise-affine velocity fields. The proposed method is simple yet highly expressive, effortlessly handles optional constraints (e.g., volume preservation and/or boundary conditions), and supports convenient modeling choices such as smoothing priors and coarse-to-fine analysis. Importantly, the proposed approach, partly due to its rapid likelihood evaluations and partly due to its other properties, facilitates tractable inference over rich transformation spaces, including using Markov-Chain Monte-Carlo methods. Its applications include, but are not limited to: monotonic regression (more generally, optimization overmore » monotonic functions); modeling cumulative distribution functions or histograms; time-warping; image warping; image registration; real-time diffeomorphic image editing; data augmentation for image classifiers. Our GPU-based code is publicly available.« less
Transformations Based on Continuous Piecewise-Affine Velocity Fields
Freifeld, Oren; Hauberg, Søren; Batmanghelich, Kayhan; Fisher, Jonn W.
2018-01-01
We propose novel finite-dimensional spaces of well-behaved ℝn → ℝn transformations. The latter are obtained by (fast and highly-accurate) integration of continuous piecewise-affine velocity fields. The proposed method is simple yet highly expressive, effortlessly handles optional constraints (e.g., volume preservation and/or boundary conditions), and supports convenient modeling choices such as smoothing priors and coarse-to-fine analysis. Importantly, the proposed approach, partly due to its rapid likelihood evaluations and partly due to its other properties, facilitates tractable inference over rich transformation spaces, including using Markov-Chain Monte-Carlo methods. Its applications include, but are not limited to: monotonic regression (more generally, optimization over monotonic functions); modeling cumulative distribution functions or histograms; time-warping; image warping; image registration; real-time diffeomorphic image editing; data augmentation for image classifiers. Our GPU-based code is publicly available. PMID:28092517
Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition
Fraley, Chris; Percival, Daniel
2014-01-01
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001
Modelling the behaviour of unemployment rates in the US over time and across space
NASA Astrophysics Data System (ADS)
Holmes, Mark J.; Otero, Jesús; Panagiotidis, Theodore
2013-11-01
This paper provides evidence that unemployment rates across US states are stationary and therefore behave according to the natural rate hypothesis. We provide new insights by considering the effect of key variables on the speed of adjustment associated with unemployment shocks. A highly-dimensional VAR analysis of the half-lives associated with shocks to unemployment rates in pairs of states suggests that the distance between states and vacancy rates respectively exert a positive and negative influence. We find that higher homeownership rates do not lead to higher half-lives. When the symmetry assumption is relaxed through quantile regression, support for the Oswald hypothesis through a positive relationship between homeownership rates and half-lives is found at the higher quantiles.
He, Ling Yan; Wang, Tie-Jun; Wang, Chuan
2016-07-11
High-dimensional quantum system provides a higher capacity of quantum channel, which exhibits potential applications in quantum information processing. However, high-dimensional universal quantum logic gates is difficult to achieve directly with only high-dimensional interaction between two quantum systems and requires a large number of two-dimensional gates to build even a small high-dimensional quantum circuits. In this paper, we propose a scheme to implement a general controlled-flip (CF) gate where the high-dimensional single photon serve as the target qudit and stationary qubits work as the control logic qudit, by employing a three-level Λ-type system coupled with a whispering-gallery-mode microresonator. In our scheme, the required number of interaction times between the photon and solid state system reduce greatly compared with the traditional method which decomposes the high-dimensional Hilbert space into 2-dimensional quantum space, and it is on a shorter temporal scale for the experimental realization. Moreover, we discuss the performance and feasibility of our hybrid CF gate, concluding that it can be easily extended to a 2n-dimensional case and it is feasible with current technology.
Butelman, Eduardo Roque; Bacciardi, Silvia; Maremmani, Angelo Giovanni Icro; Darst-Campbell, Maya; Correa da Rosa, Joel; Kreek, Mary Jeanne
2017-09-01
Addictions to heroin or to cocaine are associated with substantial psychiatric comorbidity, including depression. Poly-drug self-exposure (eg, to heroin, cocaine, cannabis, or alcohol) is also common, and may further affect depression comorbidity. This case-control study examined the relationship of exposure to the above drugs and depression comorbidity. Participants were recruited from methadone maintenance clinics, and from the community. Adult male and female participants (n = 1,201) were ascertained consecutively by experienced licensed clinicians. The instruments used were the SCID-I, and Kreek-McHugh-Schluger-Kellogg (KMSK) scales, which provide a rapid dimensional measure of maximal lifetime self-exposure to each of the above drugs. This measure ranges from no exposure to high unit dose, high frequency, and long duration of exposure. A multiple logistic regression with stepwise variable selection revealed that increasing exposure to heroin or to cocaine was associated greater odds of depression, with all cases and controls combined. In cases with an opioid dependence diagnosis, increasing cocaine exposure was associated with a further increase in odds of depression. However, in cases with a cocaine dependence diagnosis, increasing exposure to either cannabis or alcohol, as well as heroin, was associated with a further increase in odds of depression. This dimensional analysis of exposure to specific drugs provides insights on depression comorbidity with addictive diseases, and the impact of poly-drug exposure. A rapid analysis of exposure to drugs of abuse reveals how specific patterns of drug and poly-drug exposure are associated with increasing odds of depression. This approach detected quantitatively how different patterns of poly-drug exposure can result in increased odds of depression comorbidity, in cases diagnosed with opioid versus cocaine dependence. (Am J Addict 2017;26:632-639). © 2017 American Academy of Addiction Psychiatry.
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan
2016-05-01
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders. Copyright © 2016 Elsevier B.V. All rights reserved.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
Evolution of intracranial atherosclerotic disease under modern medical therapy.
Leung, Thomas W; Wang, Lily; Soo, Yannie O Y; Ip, Vincent H L; Chan, Anne Y Y; Au, Lisa W C; Fan, Florence S Y; Lau, Alex Y L; Leung, Howan; Abrigo, Jill; Wong, Adrian; Mok, Vincent C T; Ng, Ping Wing; Tsoi, Tak Hong; Li, Siu Hung; Man, Celeste B L; Fong, Wing Chi; Wong, Ka Sing; Yu, Simon C H
2015-03-01
Understanding how symptomatic intracranial atherosclerotic disease (ICAD) evolves with current medical therapy may inform secondary stroke prevention. In a prospective academic-initiated study, we recruited 50 patients (mean age = 63.4 ± 9.0 years) with acute strokes attributed to high-grade (≥70%) intracranial atherosclerotic stenosis for 3-dimensional rotational angiograms before and after intensive medical therapy for 12 months. Treatment targets included low-density lipoprotein ≤ 70mg/dl, glycosylated hemoglobin (HbA1c) ≤ 6.5%, and systolic blood pressure ≤ 140 mmHg. We analyzed infarct topography and monitored microembolic signal in recurrent strokes. The reference group was a published cohort of 143 ICAD patients. Overall, the stenoses regressed from 79% at baseline (interquartile range [IQR] = 71-87%) to 63% (IQR = 54-74%) in 1 year (p < 0.001). Specifically, the qualifying lesions (n = 49) regressed (stenosis reduced >10%) in 24 patients (49%), remained quiescent (stenosis same or ±10%) in 21 patients (43%), and progressed (stenosis increased >10%) in 4 patients (8%). There was no difference in intensity of risk factor control between groups of diverging clinical or angiographic outcomes. Higher HbA1c at baseline predicted plaque regression at 1 year (odds ratio = 4.4, 95% confidence interval = 1.4-14.5, p = 0.006). Among the 6 patients with recurrent strokes pertaining to the qualifying stenosis, 5 patients had solitary or rosarylike acute infarcts along the internal or anterior border zones, and 2 patients showed microembolic signals in transcranial Doppler ultrasound. A majority of symptomatic high-grade intracranial plaques had regressed or remained quiescent by 12 months under intensive medical therapy. Artery-to-artery thromboembolism with impaired washout at border zones was a common mechanism in stroke recurrence. © 2014 American Neurological Association.
Investigation on the effect of diaphragm on the combustion characteristics of solid-fuel ramjet
NASA Astrophysics Data System (ADS)
Gong, Lunkun; Chen, Xiong; Yang, Haitao; Li, Weixuan; Zhou, Changsheng
2017-10-01
The flow field characteristics and the regression rate distribution of solid-fuel ramjet with three-hole diaphragm were investigated by numerical and experimental methods. The experimental data were obtained by burning high-density polyethylene using a connected-pipe facility to validate the numerical model and analyze the combustion efficiency of the solid-fuel ramjet. The three-dimensional code developed in the present study adopted three-order MUSCL and central difference schemes, AUSMPW + flux vector splitting method, and second-order moment turbulence-chemistry model, together with k-ω shear stress transport (SST) turbulence model. The solid fuel surface temperature was calculated with fluid-solid heat coupling method. The numerical results show that strong circumferential flow exists in the region upstream of the diaphragm. The diaphragm can enhance the regression rate of the solid fuel in the region downstream of the diaphragm significantly, which mainly results from the increase of turbulent viscosity. As the diaphragm port area decreases, the regression rate of the solid fuel downstream of the diaphragm increases. The diaphragm can result in more sufficient mixing between the incoming air and fuel pyrolysis gases, while inevitably producing some pressure loss. The experimental results indicate that the effect of the diaphragm on the combustion efficiency of hydrocarbon fuels is slightly negative. It is conjectured that the diaphragm may have some positive effects on the combustion efficiency of the solid fuel with metal particles.
Optical critical dimension metrology for directed self-assembly assisted contact hole shrink
NASA Astrophysics Data System (ADS)
Dixit, Dhairya; Green, Avery; Hosler, Erik R.; Kamineni, Vimal; Preil, Moshe E.; Keller, Nick; Race, Joseph; Chun, Jun Sung; O'Sullivan, Michael; Khare, Prasanna; Montgomery, Warren; Diebold, Alain C.
2016-01-01
Directed self-assembly (DSA) is a potential patterning solution for future generations of integrated circuits. Its main advantages are high pattern resolution (˜10 nm), high throughput, no requirement of high-resolution mask, and compatibility with standard fab-equipment and processes. The application of Mueller matrix (MM) spectroscopic ellipsometry-based scatterometry to optically characterize DSA patterned contact hole structures fabricated with phase-separated polystyrene-b-polymethylmethacrylate (PS-b-PMMA) is described. A regression-based approach is used to calculate the guide critical dimension (CD), DSA CD, height of the PS column, thicknesses of underlying layers, and contact edge roughness of the post PMMA etch DSA contact hole sample. Scanning electron microscopy and imaging analysis is conducted as a comparative metric for scatterometry. In addition, optical model-based simulations are used to investigate MM elements' sensitivity to various DSA-based contact hole structures, predict sensitivity to dimensional changes, and its limits to characterize DSA-induced defects, such as hole placement inaccuracy, missing vias, and profile inaccuracy of the PMMA cylinder.
Lee, Martha; Brauer, Michael; Wong, Paulina; Tang, Robert; Tsui, Tsz Him; Choi, Crystal; Cheng, Wei; Lai, Poh-Chin; Tian, Linwei; Thach, Thuan-Quoc; Allen, Ryan; Barratt, Benjamin
2017-08-15
Land use regression (LUR) is a common method of predicting spatial variability of air pollution to estimate exposure. Nitrogen dioxide (NO 2 ), nitric oxide (NO), fine particulate matter (PM 2.5 ), and black carbon (BC) concentrations were measured during two sampling campaigns (April-May and November-January) in Hong Kong (a prototypical high-density high-rise city). Along with 365 potential geospatial predictor variables, these concentrations were used to build two-dimensional land use regression (LUR) models for the territory. Summary statistics for combined measurements over both campaigns were: a) NO 2 (Mean=106μg/m 3 , SD=38.5, N=95), b) NO (M=147μg/m 3 , SD=88.9, N=40), c) PM 2.5 (M=35μg/m 3 , SD=6.3, N=64), and BC (M=10.6μg/m 3 , SD=5.3, N=76). Final LUR models had the following statistics: a) NO 2 (R 2 =0.46, RMSE=28μg/m 3 ) b) NO (R 2 =0.50, RMSE=62μg/m 3 ), c) PM 2.5 (R 2 =0.59; RMSE=4μg/m 3 ), and d) BC (R 2 =0.50, RMSE=4μg/m 3 ). Traditional LUR predictors such as road length, car park density, and land use types were included in most models. The NO 2 prediction surface values were highest in Kowloon and the northern region of Hong Kong Island (downtown Hong Kong). NO showed a similar pattern in the built-up region. Both PM 2.5 and BC predictions exhibited a northwest-southeast gradient, with higher concentrations in the north (close to mainland China). For BC, the port was also an area of elevated predicted concentrations. The results matched with existing literature on spatial variation in concentrations of air pollutants and in relation to important emission sources in Hong Kong. The success of these models suggests LUR is appropriate in high-density, high-rise cities. Copyright © 2017 Elsevier B.V. All rights reserved.
Kehimkar, Benjamin; Hoggard, Jamin C; Marney, Luke C; Billingsley, Matthew C; Fraga, Carlos G; Bruno, Thomas J; Synovec, Robert E
2014-01-31
There is an increased need to more fully assess and control the composition of kerosene-based rocket propulsion fuels such as RP-1. In particular, it is critical to make better quantitative connections among the following three attributes: fuel performance (thermal stability, sooting propensity, engine specific impulse, etc.), fuel properties (such as flash point, density, kinematic viscosity, net heat of combustion, and hydrogen content), and the chemical composition of a given fuel, i.e., amounts of specific chemical compounds and compound classes present in a fuel as a result of feedstock blending and/or processing. Recent efforts in predicting fuel chemical and physical behavior through modeling put greater emphasis on attaining detailed and accurate fuel properties and fuel composition information. Often, one-dimensional gas chromatography (GC) combined with mass spectrometry (MS) is employed to provide chemical composition information. Building on approaches that used GC-MS, but to glean substantially more chemical information from these complex fuels, we recently studied the use of comprehensive two dimensional (2D) gas chromatography combined with time-of-flight mass spectrometry (GC×GC-TOFMS) using a "reversed column" format: RTX-wax column for the first dimension, and a RTX-1 column for the second dimension. In this report, by applying chemometric data analysis, specifically partial least-squares (PLS) regression analysis, we are able to readily model (and correlate) the chemical compositional information provided by use of GC×GC-TOFMS to RP-1 fuel property information such as density, kinematic viscosity, net heat of combustion, and so on. Furthermore, we readily identified compounds that contribute significantly to measured differences in fuel properties based on results from the PLS models. We anticipate this new chemical analysis strategy will have broad implications for the development of high fidelity composition-property models, leading to an improved approach to fuel formulation and specification for advanced engine cycles. Copyright © 2014 Elsevier B.V. All rights reserved.
EUCLID: an outcome analysis tool for high-dimensional clinical studies
NASA Astrophysics Data System (ADS)
Gayou, Olivier; Parda, David S.; Miften, Moyed
2007-03-01
Treatment management decisions in three-dimensional conformal radiation therapy (3DCRT) and intensity-modulated radiation therapy (IMRT) are usually made based on the dose distributions in the target and surrounding normal tissue. These decisions may include, for example, the choice of one treatment over another and the level of tumour dose escalation. Furthermore, biological predictors such as tumour control probability (TCP) and normal tissue complication probability (NTCP), whose parameters available in the literature are only population-based estimates, are often used to assess and compare plans. However, a number of other clinical, biological and physiological factors also affect the outcome of radiotherapy treatment and are often not considered in the treatment planning and evaluation process. A statistical outcome analysis tool, EUCLID, for direct use by radiation oncologists and medical physicists was developed. The tool builds a mathematical model to predict an outcome probability based on a large number of clinical, biological, physiological and dosimetric factors. EUCLID can first analyse a large set of patients, such as from a clinical trial, to derive regression correlation coefficients between these factors and a given outcome. It can then apply such a model to an individual patient at the time of treatment to derive the probability of that outcome, allowing the physician to individualize the treatment based on medical evidence that encompasses a wide range of factors. The software's flexibility allows the clinicians to explore several avenues to select the best predictors of a given outcome. Its link to record-and-verify systems and data spreadsheets allows for a rapid and practical data collection and manipulation. A wide range of statistical information about the study population, including demographics and correlations between different factors, is available. A large number of one- and two-dimensional plots, histograms and survival curves allow for an easy visual analysis of the population. Several visual and analytical methods are available to quantify the predictive power of the multivariate regression model. The EUCLID tool can be readily integrated with treatment planning and record-and-verify systems.
EUCLID: an outcome analysis tool for high-dimensional clinical studies.
Gayou, Olivier; Parda, David S; Miften, Moyed
2007-03-21
Treatment management decisions in three-dimensional conformal radiation therapy (3DCRT) and intensity-modulated radiation therapy (IMRT) are usually made based on the dose distributions in the target and surrounding normal tissue. These decisions may include, for example, the choice of one treatment over another and the level of tumour dose escalation. Furthermore, biological predictors such as tumour control probability (TCP) and normal tissue complication probability (NTCP), whose parameters available in the literature are only population-based estimates, are often used to assess and compare plans. However, a number of other clinical, biological and physiological factors also affect the outcome of radiotherapy treatment and are often not considered in the treatment planning and evaluation process. A statistical outcome analysis tool, EUCLID, for direct use by radiation oncologists and medical physicists was developed. The tool builds a mathematical model to predict an outcome probability based on a large number of clinical, biological, physiological and dosimetric factors. EUCLID can first analyse a large set of patients, such as from a clinical trial, to derive regression correlation coefficients between these factors and a given outcome. It can then apply such a model to an individual patient at the time of treatment to derive the probability of that outcome, allowing the physician to individualize the treatment based on medical evidence that encompasses a wide range of factors. The software's flexibility allows the clinicians to explore several avenues to select the best predictors of a given outcome. Its link to record-and-verify systems and data spreadsheets allows for a rapid and practical data collection and manipulation. A wide range of statistical information about the study population, including demographics and correlations between different factors, is available. A large number of one- and two-dimensional plots, histograms and survival curves allow for an easy visual analysis of the population. Several visual and analytical methods are available to quantify the predictive power of the multivariate regression model. The EUCLID tool can be readily integrated with treatment planning and record-and-verify systems.
Measurement of ventricular torsion by two-dimensional ultrasound speckle tracking imaging.
Notomi, Yuichi; Lysyansky, Peter; Setser, Randolph M; Shiota, Takahiro; Popović, Zoran B; Martin-Miklovic, Maureen G; Weaver, Joan A; Oryszak, Stephanie J; Greenberg, Neil L; White, Richard D; Thomas, James D
2005-06-21
We sought to examine the accuracy/consistency of a novel ultrasound speckle tracking imaging (STI) method for left ventricular torsion (LVtor) measurement in comparison with tagged magnetic resonance imaging (MRI) (a time-domain method similar to STI) and Doppler tissue imaging (DTI) (a velocity-based approach). Left ventricular torsion from helically oriented myofibers is a key parameter of cardiac performance but is difficult to measure. Ultrasound STI is potentially suitable for measurement of angular motion because of its angle-independence. We acquired basal and apical short-axis left ventricular (LV) images in 15 patients to estimate LVtor by STI and compare it with tagged MRI and DTI. Left ventricular torsion was defined as the net difference of LV rotation at the basal and apical planes. For the STI analysis, we used high-frame (104 +/- 12 frames/s) second harmonic two-dimensional images. Data on 13 of 15 patients were usable for STI analysis, and LVtor profile estimated by STI strongly correlated with those by tagged MRI (y = 0.95x + 0.19, r = 0.93, p < 0.0001, analyzed by repeated-measures regression models). The STI torsional velocity profile also correlated well with that by the DTI method (y = 0.79x + 2.4, r = 0.76, p < 0.0001, by repeated-measures regression models) with acceptable bias. The STI estimation of LVtor is concordant with those analyzed by tagged MRI (data derived from tissue displacement) and also showed good agreement with those by DTI (data derived from tissue velocity). Ultrasound STI is a promising new method to assess LV torsional deformation and may make the assessment more available in clinical and research cardiology.
Freye, Chris E; Fitz, Brian D; Billingsley, Matthew C; Synovec, Robert E
2016-06-01
The chemical composition and several physical properties of RP-1 fuels were studied using comprehensive two-dimensional (2D) gas chromatography (GC×GC) coupled with flame ionization detection (FID). A "reversed column" GC×GC configuration was implemented with a RTX-wax column on the first dimension ((1)D), and a RTX-1 as the second dimension ((2)D). Modulation was achieved using a high temperature diaphragm valve mounted directly in the oven. Using leave-one-out cross-validation (LOOCV), the summed GC×GC-FID signal of three compound-class selective 2D regions (alkanes, cycloalkanes, and aromatics) was regressed against previously measured ASTM derived values for these compound classes, yielding root mean square errors of cross validation (RMSECV) of 0.855, 0.734, and 0.530mass%, respectively. For comparison, using partial least squares (PLS) analysis with LOOCV, the GC×GC-FID signal of the entire 2D separations was regressed against the same ASTM values, yielding a linear trend for the three compound classes (alkanes, cycloalkanes, and aromatics), yielding RMSECV values of 1.52, 2.76, and 0.945 mass%, respectively. Additionally, a more detailed PLS analysis was undertaken of the compounds classes (n-alkanes, iso-alkanes, mono-, di-, and tri-cycloalkanes, and aromatics), and of physical properties previously determined by ASTM methods (such as net heat of combustion, hydrogen content, density, kinematic viscosity, sustained boiling temperature and vapor rise temperature). Results from these PLS studies using the relatively simple to use and inexpensive GC×GC-FID instrumental platform are compared to previously reported results using the GC×GC-TOFMS instrumental platform. Copyright © 2016 Elsevier B.V. All rights reserved.
Partitioned learning of deep Boltzmann machines for SNP data.
Hess, Moritz; Lenz, Stefan; Blätte, Tamara J; Bullinger, Lars; Binder, Harald
2017-10-15
Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen the joint distribution of SNPs, followed by training several DBMs on SNP partitions that were identified by the screening. Aggregate features representing SNP patterns and the corresponding SNPs are extracted from the DBMs by a combination of statistical tests and sparse regression. In simulated case-control data, we show how this can uncover complex SNP patterns and augment results from univariate approaches, while maintaining type 1 error control. Time-to-event endpoints are considered in an application with acute myeloid leukemia patients, where SNP patterns are modeled after a pre-screening based on gene expression data. The proposed approach identified three SNPs that seem to jointly influence survival in a validation dataset. This indicates the added value of jointly investigating SNPs compared to standard univariate analyses and makes partitioned learning of DBMs an interesting complementary approach when analyzing SNP data. A Julia package is provided at 'http://github.com/binderh/BoltzmannMachines.jl'. binderh@imbi.uni-freiburg.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hero, Alfred O.; Rajaratnam, Bala
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-01-01
When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-12-09
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
An efficient surrogate-based simulation-optimization method for calibrating a regional MODFLOW model
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.
2017-01-01
Simulation-optimization method entails a large number of model simulations, which is computationally intensive or even prohibitive if the model simulation is extremely time-consuming. Statistical models have been examined as a surrogate of the high-fidelity physical model during simulation-optimization process to tackle this problem. Among them, Multivariate Adaptive Regression Splines (MARS), a non-parametric adaptive regression method, is superior in overcoming problems of high-dimensions and discontinuities of the data. Furthermore, the stability and accuracy of MARS model can be improved by bootstrap aggregating methods, namely, bagging. In this paper, Bagging MARS (BMARS) method is integrated to a surrogate-based simulation-optimization framework to calibrate a three-dimensional MODFLOW model, which is developed to simulate the groundwater flow in an arid hardrock-alluvium region in northwestern Oman. The physical MODFLOW model is surrogated by the statistical model developed using BMARS algorithm. The surrogate model, which is fitted and validated using training dataset generated by the physical model, can approximate solutions rapidly. An efficient Sobol' method is employed to calculate global sensitivities of head outputs to input parameters, which are used to analyze their importance for the model outputs spatiotemporally. Only sensitive parameters are included in the calibration process to further improve the computational efficiency. Normalized root mean square error (NRMSE) between measured and simulated heads at observation wells is used as the objective function to be minimized during optimization. The reasonable history match between the simulated and observed heads demonstrated feasibility of this high-efficient calibration framework.
Optimizing human activity patterns using global sensitivity analysis.
Fairchild, Geoffrey; Hickmann, Kyle S; Mniszewski, Susan M; Del Valle, Sara Y; Hyman, James M
2014-12-01
Implementing realistic activity patterns for a population is crucial for modeling, for example, disease spread, supply and demand, and disaster response. Using the dynamic activity simulation engine, DASim, we generate schedules for a population that capture regular (e.g., working, eating, and sleeping) and irregular activities (e.g., shopping or going to the doctor). We use the sample entropy (SampEn) statistic to quantify a schedule's regularity for a population. We show how to tune an activity's regularity by adjusting SampEn, thereby making it possible to realistically design activities when creating a schedule. The tuning process sets up a computationally intractable high-dimensional optimization problem. To reduce the computational demand, we use Bayesian Gaussian process regression to compute global sensitivity indices and identify the parameters that have the greatest effect on the variance of SampEn. We use the harmony search (HS) global optimization algorithm to locate global optima. Our results show that HS combined with global sensitivity analysis can efficiently tune the SampEn statistic with few search iterations. We demonstrate how global sensitivity analysis can guide statistical emulation and global optimization algorithms to efficiently tune activities and generate realistic activity patterns. Though our tuning methods are applied to dynamic activity schedule generation, they are general and represent a significant step in the direction of automated tuning and optimization of high-dimensional computer simulations.
Optimizing human activity patterns using global sensitivity analysis
Hickmann, Kyle S.; Mniszewski, Susan M.; Del Valle, Sara Y.; Hyman, James M.
2014-01-01
Implementing realistic activity patterns for a population is crucial for modeling, for example, disease spread, supply and demand, and disaster response. Using the dynamic activity simulation engine, DASim, we generate schedules for a population that capture regular (e.g., working, eating, and sleeping) and irregular activities (e.g., shopping or going to the doctor). We use the sample entropy (SampEn) statistic to quantify a schedule’s regularity for a population. We show how to tune an activity’s regularity by adjusting SampEn, thereby making it possible to realistically design activities when creating a schedule. The tuning process sets up a computationally intractable high-dimensional optimization problem. To reduce the computational demand, we use Bayesian Gaussian process regression to compute global sensitivity indices and identify the parameters that have the greatest effect on the variance of SampEn. We use the harmony search (HS) global optimization algorithm to locate global optima. Our results show that HS combined with global sensitivity analysis can efficiently tune the SampEn statistic with few search iterations. We demonstrate how global sensitivity analysis can guide statistical emulation and global optimization algorithms to efficiently tune activities and generate realistic activity patterns. Though our tuning methods are applied to dynamic activity schedule generation, they are general and represent a significant step in the direction of automated tuning and optimization of high-dimensional computer simulations. PMID:25580080
Optimizing human activity patterns using global sensitivity analysis
Fairchild, Geoffrey; Hickmann, Kyle S.; Mniszewski, Susan M.; ...
2013-12-10
Implementing realistic activity patterns for a population is crucial for modeling, for example, disease spread, supply and demand, and disaster response. Using the dynamic activity simulation engine, DASim, we generate schedules for a population that capture regular (e.g., working, eating, and sleeping) and irregular activities (e.g., shopping or going to the doctor). We use the sample entropy (SampEn) statistic to quantify a schedule’s regularity for a population. We show how to tune an activity’s regularity by adjusting SampEn, thereby making it possible to realistically design activities when creating a schedule. The tuning process sets up a computationally intractable high-dimensional optimizationmore » problem. To reduce the computational demand, we use Bayesian Gaussian process regression to compute global sensitivity indices and identify the parameters that have the greatest effect on the variance of SampEn. Here we use the harmony search (HS) global optimization algorithm to locate global optima. Our results show that HS combined with global sensitivity analysis can efficiently tune the SampEn statistic with few search iterations. We demonstrate how global sensitivity analysis can guide statistical emulation and global optimization algorithms to efficiently tune activities and generate realistic activity patterns. Finally, though our tuning methods are applied to dynamic activity schedule generation, they are general and represent a significant step in the direction of automated tuning and optimization of high-dimensional computer simulations.« less
Tello, Javier; Cubero, Sergio; Blasco, José; Tardaguila, Javier; Aleixos, Nuria; Ibáñez, Javier
2016-10-01
Grapevine cluster morphology influences the quality and commercial value of wine and table grapes. It is routinely evaluated by subjective and inaccurate methods that do not meet the requirements set by the food industry. Novel two-dimensional (2D) and three-dimensional (3D) machine vision technologies emerge as promising tools for its automatic and fast evaluation. The automatic evaluation of cluster length, width and elongation was successfully achieved by the analysis of 2D images, significant and strong correlations with the manual methods being found (r = 0.959, 0.861 and 0.852, respectively). The classification of clusters according to their shape can be achieved by evaluating their conicity in different sections of the cluster. The geometric reconstruction of the morphological volume of the cluster from 2D features worked better than the direct 3D laser scanning system, showing a high correlation (r = 0.956) with the manual approach (water displacement method). In addition, we constructed and validated a simple linear regression model for cluster compactness estimation. It showed a high predictive capacity for both the training and validation subsets of clusters (R(2) = 84.5 and 71.1%, respectively). The methodologies proposed in this work provide continuous and accurate data for the fast and objective characterisation of cluster morphology. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Updated generalized biomass equations for North American tree species
David C. Chojnacky; Linda S. Heath; Jennifer C. Jenkins
2014-01-01
Historically, tree biomass at large scales has been estimated by applying dimensional analysis techniques and field measurements such as diameter at breast height (dbh) in allometric regression equations. Equations often have been developed using differing methods and applied only to certain species or isolated areas. We previously had compiled and combined (in meta-...
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-09-29
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.
Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B
2018-06-03
There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-01-01
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients. PMID:29100405
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Sparse High Dimensional Models in Economics
Fan, Jianqing; Lv, Jinchi; Qi, Lei
2010-01-01
This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635
Li, Xue; Dong, Jiao
2018-01-01
The material considered in this study not only has a functionally graded characteristic but also exhibits different tensile and compressive moduli of elasticity. One-dimensional and two-dimensional mechanical models for a functionally graded beam with a bimodular effect were established first. By taking the grade function as an exponential expression, the analytical solutions of a bimodular functionally graded beam under pure bending and lateral-force bending were obtained. The regression from a two-dimensional solution to a one-dimensional solution is verified. The physical quantities in a bimodular functionally graded beam are compared with their counterparts in a classical problem and a functionally graded beam without a bimodular effect. The validity of the plane section assumption under pure bending and lateral-force bending is analyzed. Three typical cases that the tensile modulus is greater than, equal to, or less than the compressive modulus are discussed. The result indicates that due to the introduction of the bimodular functionally graded effect of the materials, the maximum tensile and compressive bending stresses may not take place at the bottom and top of the beam. The real location at which the maximum bending stress takes place is determined via the extreme condition for the analytical solution. PMID:29772835
2016-01-01
Purpose The aim of this study was to evaluate alterations of papilla dimensions after orthodontic closure of the diastema between maxillary central incisors. Methods Sixty patients who had a visible diastema between maxillary central incisors that had been closed by orthodontic approximation were selected for this study. Various papilla dimensions were assessed on clinical photographs and study models before the orthodontic treatment and at the follow-up examination after closure of the diastema. Influences of the variables assessed before orthodontic treatment on the alterations of papilla height (PH) and papilla base thickness (PBT) were evaluated by univariate regression analysis. To analyze potential influences of the 3-dimensional papilla dimensions before orthodontic treatment on the alterations of PH and PBT, a multiple regression model was formulated including the 3-dimensional papilla dimensions as predictor variables. Results On average, PH decreased by 0.80 mm and PBT increased after orthodontic closure of the diastema (P<0.01). Univariate regression analysis revealed that the PH (P=0.002) and PBT (P=0.047) before orthodontic treatment influenced the alteration of PH. With respect to the alteration of PBT, the diastema width (P=0.045) and PBT (P=0.000) were found to be influential factors. PBT before the orthodontic treatment significantly influenced the alteration of PBT in the multiple regression model. Conclusions PH decreased but PBT increased after orthodontic closure of the diastema. The papilla dimensions before orthodontic treatment influenced the alterations of PH and PBT after closure of the diastema. The PBT increased more when the diastema width before the orthodontic treatment was larger. PMID:27382507
Three-dimensional reconstruction of Roman coins from photometric image sets
NASA Astrophysics Data System (ADS)
MacDonald, Lindsay; Moitinho de Almeida, Vera; Hess, Mona
2017-01-01
A method is presented for increasing the spatial resolution of the three-dimensional (3-D) digital representation of coins by combining fine photometric detail derived from a set of photographic images with accurate geometric data from a 3-D laser scanner. 3-D reconstructions were made of the obverse and reverse sides of two ancient Roman denarii by processing sets of images captured under directional lighting in an illumination dome. Surface normal vectors were calculated by a "bounded regression" technique, excluding both shadow and specular components of reflection from the metallic surface. Because of the known difficulty in achieving geometric accuracy when integrating photometric normals to produce a digital elevation model, the low spatial frequencies were replaced by those derived from the point cloud produced by a 3-D laser scanner. The two datasets were scaled and registered by matching the outlines and correlating the surface gradients. The final result was a realistic rendering of the coins at a spatial resolution of 75 pixels/mm (13-μm spacing), in which the fine detail modulated the underlying geometric form of the surface relief. The method opens the way to obtain high quality 3-D representations of coins in collections to enable interactive online viewing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mizukami, Wataru, E-mail: wataru.mizukami@bristol.ac.uk; Tew, David P., E-mail: david.tew@bristol.ac.uk; Habershon, Scott, E-mail: S.Habershon@warwick.ac.uk
2014-10-14
We present a new approach to semi-global potential energy surface fitting that uses the least absolute shrinkage and selection operator (LASSO) constrained least squares procedure to exploit an extremely flexible form for the potential function, while at the same time controlling the risk of overfitting and avoiding the introduction of unphysical features such as divergences or high-frequency oscillations. Drawing from a massively redundant set of overlapping distributed multi-dimensional Gaussian functions of inter-atomic separations we build a compact full-dimensional surface for malonaldehyde, fit to explicitly correlated coupled cluster CCSD(T)(F12*) energies with a root mean square deviations accuracy of 0.3%–0.5% up tomore » 25 000 cm{sup −1} above equilibrium. Importance-sampled diffusion Monte Carlo calculations predict zero point energies for malonaldehyde and its deuterated isotopologue of 14 715.4(2) and 13 997.9(2) cm{sup −1} and hydrogen transfer tunnelling splittings of 21.0(4) and 3.2(4) cm{sup −1}, respectively, which are in excellent agreement with the experimental values of 21.583 and 2.915(4) cm{sup −1}.« less
Li, Ziyi; Safo, Sandra E; Long, Qi
2017-07-11
Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.
Sparse Regression as a Sparse Eigenvalue Problem
NASA Technical Reports Server (NTRS)
Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai
2008-01-01
We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
Stingone, Jeanette A; Pandey, Om P; Claudio, Luz; Pandey, Gaurav
2017-11-01
Data-driven machine learning methods present an opportunity to simultaneously assess the impact of multiple air pollutants on health outcomes. The goal of this study was to apply a two-stage, data-driven approach to identify associations between air pollutant exposure profiles and children's cognitive skills. Data from 6900 children enrolled in the Early Childhood Longitudinal Study, Birth Cohort, a national study of children born in 2001 and followed through kindergarten, were linked to estimated concentrations of 104 ambient air toxics in the 2002 National Air Toxics Assessment using ZIP code of residence at age 9 months. In the first-stage, 100 regression trees were learned to identify ambient air pollutant exposure profiles most closely associated with scores on a standardized mathematics test administered to children in kindergarten. In the second-stage, the exposure profiles frequently predicting lower math scores were included within linear regression models and adjusted for confounders in order to estimate the magnitude of their effect on math scores. This approach was applied to the full population, and then to the populations living in urban and highly-populated urban areas. Our first-stage results in the full population suggested children with low trichloroethylene exposure had significantly lower math scores. This association was not observed for children living in urban communities, suggesting that confounding related to urbanicity needs to be considered within the first-stage. When restricting our analysis to populations living in urban and highly-populated urban areas, high isophorone levels were found to predict lower math scores. Within adjusted regression models of children in highly-populated urban areas, the estimated effect of higher isophorone exposure on math scores was -1.19 points (95% CI -1.94, -0.44). Similar results were observed for the overall population of urban children. This data-driven, two-stage approach can be applied to other populations, exposures and outcomes to generate hypotheses within high-dimensional exposure data. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Prediction of epigenetically regulated genes in breast cancer cell lines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen
Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines,more » which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.« less
Dimensionality reduction of collective motion by principal manifolds
NASA Astrophysics Data System (ADS)
Gajamannage, Kelum; Butail, Sachit; Porfiri, Maurizio; Bollt, Erik M.
2015-01-01
While the existence of low-dimensional embedding manifolds has been shown in patterns of collective motion, the current battery of nonlinear dimensionality reduction methods is not amenable to the analysis of such manifolds. This is mainly due to the necessary spectral decomposition step, which limits control over the mapping from the original high-dimensional space to the embedding space. Here, we propose an alternative approach that demands a two-dimensional embedding which topologically summarizes the high-dimensional data. In this sense, our approach is closely related to the construction of one-dimensional principal curves that minimize orthogonal error to data points subject to smoothness constraints. Specifically, we construct a two-dimensional principal manifold directly in the high-dimensional space using cubic smoothing splines, and define the embedding coordinates in terms of geodesic distances. Thus, the mapping from the high-dimensional data to the manifold is defined in terms of local coordinates. Through representative examples, we show that compared to existing nonlinear dimensionality reduction methods, the principal manifold retains the original structure even in noisy and sparse datasets. The principal manifold finding algorithm is applied to configurations obtained from a dynamical system of multiple agents simulating a complex maneuver called predator mobbing, and the resulting two-dimensional embedding is compared with that of a well-established nonlinear dimensionality reduction method.
Minimizers with Bounded Action for the High-Dimensional Frenkel-Kontorova Model
NASA Astrophysics Data System (ADS)
Miao, Xue-Qing; Wang, Ya-Nan; Qin, Wen-Xin
In Aubry-Mather theory for monotone twist maps or for one-dimensional Frenkel-Kontorova (FK) model with nearest neighbor interactions, each global minimizer (minimal energy configuration) is naturally Birkhoff. However, this is not true for the one-dimensional FK model with non-nearest neighbor interactions or for the high-dimensional FK model. In this paper, we study the Birkhoff property of minimizers with bounded action for the high-dimensional FK model.
Construction of anxiety and dimensional personality model in college students.
Abdel-Khalek, Ahmed M
2013-06-01
A sample of 402 volunteer male (n = 156) and female (n = 246) Kuwaiti undergraduates responded to the Arabic versions of the Kuwait University Anxiety Scale and the Eysenck Personality Questionnaire. The latter questionnaire has four subscales: Psychoticism, Extraversion, Neuroticism, and Lie. Women obtained a higher mean score on Kuwait University Anxiety Scale and Neuroticism than did men, while men had a higher mean score on Psychoticism than did women. Factor analysis of the intercorrelations between the five variables, separately conducted for men and women, gave rise to two orthogonal factors called Anxiety-and-Neuroticism vs Extraversion, and Psychoticism vs Lie. Stepwise regression revealed that Neuroticism was the main predictor of anxiety. It was concluded that persons with high Neuroticism scores may be more vulnerable to anxiety than those with low scores.
Estimating Mixture of Gaussian Processes by Kernel Smoothing
Huang, Mian; Li, Runze; Wang, Hansheng; Yao, Weixin
2014-01-01
When the functional data are not homogeneous, e.g., there exist multiple classes of functional curves in the dataset, traditional estimation methods may fail. In this paper, we propose a new estimation procedure for the Mixture of Gaussian Processes, to incorporate both functional and inhomogeneous properties of the data. Our method can be viewed as a natural extension of high-dimensional normal mixtures. However, the key difference is that smoothed structures are imposed for both the mean and covariance functions. The model is shown to be identifiable, and can be estimated efficiently by a combination of the ideas from EM algorithm, kernel regression, and functional principal component analysis. Our methodology is empirically justified by Monte Carlo simulations and illustrated by an analysis of a supermarket dataset. PMID:24976675
Davis, C; Claridge, G; Cerullo, D
1997-01-01
Evidence shows a high comorbidity of eating disorders and some forms of personality disorder. Adopting a dimensional approach to both, our study explored their connection among a non-clinical sample. 191 young women completed personality scales of general neuroticism, and of borderline, schizotypal, obsessive-compulsive, and narcissistic (both adjustive and maladaptive) traits. Weight preoccupation (WP), as a normal analogue of eating disorders, was assessed with scales from the Eating Disorder Inventory, and height and weight measured. The data were analysed with multiple regression techniques, with WP as the dependent variable. In low to normal weight subjects, after controlling for the significant influence of body mass, the specific predictors of WP in the regression model were borderline personality and maladaptive narcissism, in the positive direction, and adjustive narcissism and obsessive-compulsiveness in the negative direction. In heavier women, narcissism made no contribution--nor, more significantly, did body mass. Patterns of association between eating pathology and personality disorder, especially borderline and narcissism, can be clearly mapped across to personality traits in the currently non-clinical population. This finding has important implications for understanding dynamics of, and identifying individuals at risk for, eating disorders.
Some comparisons of complexity in dictionary-based and linear computational models.
Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello
2011-03-01
Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Spatial Bayesian latent factor regression modeling of coordinate-based meta-analysis data.
Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D; Nichols, Thomas E
2018-03-01
Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the article are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to (i) identify areas of consistent activation; and (ii) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterized as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. © 2017, The International Biometric Society.
Houseknecht, D.W.; Bird, K.J.
2004-01-01
Beaufortian strata (Jurassic-Lower Cretaceous) in the National Petroleum Reserve in Alaska (NPRA) are a focus of exploration since the 1994 discovery of the nearby Alpine oil field (>400 MMBO). These strata include the Kingak Shale, a succession of depositional sequences influenced by rift opening of the Arctic Ocean Basin. Interpretation of sequence stratigraphy and depositional facies from a regional two-dimensional seismic grid and well data allows the definition of four sequence sets that each displays unique stratal geometries and thickness trends across NPRA. A Lower to Middle Jurassic sequence set includes numerous transgressive-regressive sequences that collectively built a clastic shelf in north-central NPRA. Along the south-facing, lobate shelf margin, condensed shales in transgressive systems tracts downlap and coalesce into a basinal condensed section that is likely an important hydrocarbon source rock. An Oxfordian-Kimmeridgian sequence set, deposited during pulses of uplift on the Barrow arch, includes multiple transgressive-regressive sequences that locally contain well-winnowed, shoreface sandstones at the base of transgressive systems tracts. These shoreface sandstones and overlying shales, deposited during maximum flooding, form stratigraphic traps that are the main objective of exploration in the Alpine play in NPRA. A Valanginian sequence set includes at least two transgressive-regressive sequences that display relatively distal characteristics, suggesting high relative sea level. An important exception is the presence of a basal transgressive systems tract that locally contains shoreface sandstones of reservoir quality. A Hauterivian sequence set includes two transgressive-regressive sequences that constitute a shelf-margin wedge developed as the result of tectonic uplift along the Barrow arch during rift opening of the Arctic Ocean Basin. This sequence set displays stratal geometries suggesting incision and synsedimentary collapse of the shelf margin. ?? 2004. The American Association of Petroleum Geologists. All rights reserved.
Long, Yi; Du, Zhi-Jiang; Chen, Chao-Feng; Dong, Wei; Wang, Wei-Dong
2017-07-01
The most important step for lower extremity exoskeleton is to infer human motion intent (HMI), which contributes to achieve human exoskeleton collaboration. Since the user is in the control loop, the relationship between human robot interaction (HRI) information and HMI is nonlinear and complicated, which is difficult to be modeled by using mathematical approaches. The nonlinear approximation can be learned by using machine learning approaches. Gaussian Process (GP) regression is suitable for high-dimensional and small-sample nonlinear regression problems. GP regression is restrictive for large data sets due to its computation complexity. In this paper, an online sparse GP algorithm is constructed to learn the HMI. The original training dataset is collected when the user wears the exoskeleton system with friction compensation to perform unconstrained movement as far as possible. The dataset has two kinds of data, i.e., (1) physical HRI, which is collected by torque sensors placed at the interaction cuffs for the active joints, i.e., knee joints; (2) joint angular position, which is measured by optical position sensors. To reduce the computation complexity of GP, grey relational analysis (GRA) is utilized to specify the original dataset and provide the final training dataset. Those hyper-parameters are optimized offline by maximizing marginal likelihood and will be applied into online GP regression algorithm. The HMI, i.e., angular position of human joints, will be regarded as the reference trajectory for the mechanical legs. To verify the effectiveness of the proposed algorithm, experiments are performed on a subject at a natural speed. The experimental results show the HMI can be obtained in real time, which can be extended and employed in the similar exoskeleton systems.
NASA Astrophysics Data System (ADS)
Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise
2017-11-01
The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under
Smoothing two-dimensional Malaysian mortality data using P-splines indexed by age and year
NASA Astrophysics Data System (ADS)
Kamaruddin, Halim Shukri; Ismail, Noriszura
2014-06-01
Nonparametric regression implements data to derive the best coefficient of a model from a large class of flexible functions. Eilers and Marx (1996) introduced P-splines as a method of smoothing in generalized linear models, GLMs, in which the ordinary B-splines with a difference roughness penalty on coefficients is being used in a single dimensional mortality data. Modeling and forecasting mortality rate is a problem of fundamental importance in insurance company calculation in which accuracy of models and forecasts are the main concern of the industry. The original idea of P-splines is extended to two dimensional mortality data. The data indexed by age of death and year of death, in which the large set of data will be supplied by Department of Statistics Malaysia. The extension of this idea constructs the best fitted surface and provides sensible prediction of the underlying mortality rate in Malaysia mortality case.
Composites from southern pine juvenile wood. Part 3. Juvenile and mature wood furnish mixtures
A.D. Pugel; E.W. Price; Chung-Yun Hse; T.F. Shupe
2004-01-01
Composite panelsmade from mixtures ofmature andjuvenile southern pine (Pinus taeda L.) were evaluated for initial mechanical properties and dimensional stability. The effect that the proportion of juvenile wood had on panel properties was analyzed by regression and rule-of-mixtures models. The mixed furnish data: 1) highlighted the degree to which...
Differences in Student Evaluations of Limited-Term Lecturers and Full-Time Faculty
ERIC Educational Resources Information Center
Cho, Jeong-Il; Otani, Koichiro; Kim, B. Joon
2014-01-01
This study compared student evaluations of teaching (SET) for limited-term lecturers (LTLs) and full-time faculty (FTF) using a Likert-scaled survey administered to students (N = 1,410) at the end of university courses. Data were analyzed using a general linear regression model to investigate the influence of multi-dimensional evaluation items on…
Predicting Negative Discipline in Traditional Families: A Multi-Dimensional Stress Model.
ERIC Educational Resources Information Center
Fisher, Philip A.
An attempt is made to integrate existing theories of family violence by introducing the concept of family role stress. Role stressors may be defined as factors inhibiting the enactment of family roles. Multiple regression analyses were performed on data from 190 families to test a hypothesis involving the prediction of negative discipline at…
Viswanathan, M; Pearl, D L; Taboada, E N; Parmley, E J; Mutschall, S K; Jardine, C M
2017-05-01
Using data collected from a cross-sectional study of 25 farms (eight beef, eight swine and nine dairy) in 2010, we assessed clustering of molecular subtypes of C. jejuni based on a Campylobacter-specific 40 gene comparative genomic fingerprinting assay (CGF40) subtypes, using unweighted pair-group method with arithmetic mean (UPGMA) analysis, and multiple correspondence analysis. Exact logistic regression was used to determine which genes differentiate wildlife and livestock subtypes in our study population. A total of 33 bovine livestock (17 beef and 16 dairy), 26 wildlife (20 raccoon (Procyon lotor), five skunk (Mephitis mephitis) and one mouse (Peromyscus spp.) C. jejuni isolates were subtyped using CGF40. Dendrogram analysis, based on UPGMA, showed distinct branches separating bovine livestock and mammalian wildlife isolates. Furthermore, two-dimensional multiple correspondence analysis was highly concordant with dendrogram analysis showing clear differentiation between livestock and wildlife CGF40 subtypes. Based on multilevel logistic regression models with a random intercept for farm of origin, we found that isolates in general, and raccoons more specifically, were significantly more likely to be part of the wildlife branch. Exact logistic regression conducted gene by gene revealed 15 genes that were predictive of whether an isolate was of wildlife or bovine livestock isolate origin. Both multiple correspondence analysis and exact logistic regression revealed that in most cases, the presence of a particular gene (13 of 15) was associated with an isolate being of livestock rather than wildlife origin. In conclusion, the evidence gained from dendrogram analysis, multiple correspondence analysis and exact logistic regression indicates that mammalian wildlife carry CGF40 subtypes of C. jejuni distinct from those carried by bovine livestock. Future studies focused on source attribution of C. jejuni in human infections will help determine whether wildlife transmit Campylobacter jejuni directly to humans. © 2016 Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Wang, Hongjin; Hsieh, Sheng-Jen; Peng, Bo; Zhou, Xunfei
2016-07-01
A method without requirements on knowledge about thermal properties of coatings or those of substrates will be interested in the industrial application. Supervised machine learning regressions may provide possible solution to the problem. This paper compares the performances of two regression models (artificial neural networks (ANN) and support vector machines for regression (SVM)) with respect to coating thickness estimations made based on surface temperature increments collected via time resolved thermography. We describe SVM roles in coating thickness prediction. Non-dimensional analyses are conducted to illustrate the effects of coating thicknesses and various factors on surface temperature increments. It's theoretically possible to correlate coating thickness with surface increment. Based on the analyses, the laser power is selected in such a way: during the heating, the temperature increment is high enough to determine the coating thickness variance but low enough to avoid surface melting. Sixty-one pain-coated samples with coating thicknesses varying from 63.5 μm to 571 μm are used to train models. Hyper-parameters of the models are optimized by 10-folder cross validation. Another 28 sets of data are then collected to test the performance of the three methods. The study shows that SVM can provide reliable predictions of unknown data, due to its deterministic characteristics, and it works well when used for a small input data group. The SVM model generates more accurate coating thickness estimates than the ANN model.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
Hayashi, Hideaki; Shibanoki, Taro; Shima, Keisuke; Kurita, Yuichi; Tsuji, Toshio
2015-12-01
This paper proposes a probabilistic neural network (NN) developed on the basis of time-series discriminant component analysis (TSDCA) that can be used to classify high-dimensional time-series patterns. TSDCA involves the compression of high-dimensional time series into a lower dimensional space using a set of orthogonal transformations and the calculation of posterior probabilities based on a continuous-density hidden Markov model with a Gaussian mixture model expressed in the reduced-dimensional space. The analysis can be incorporated into an NN, which is named a time-series discriminant component network (TSDCN), so that parameters of dimensionality reduction and classification can be obtained simultaneously as network coefficients according to a backpropagation through time-based learning algorithm with the Lagrange multiplier method. The TSDCN is considered to enable high-accuracy classification of high-dimensional time-series patterns and to reduce the computation time taken for network training. The validity of the TSDCN is demonstrated for high-dimensional artificial data and electroencephalogram signals in the experiments conducted during the study.
Penalized spline estimation for functional coefficient regression models.
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan
2010-04-01
The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes
2013-01-01
Motivation Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. Availability The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana. PMID:24564704
Wang, Yue; Goh, Wilson; Wong, Limsoon; Montana, Giovanni
2013-01-01
Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
Forecast model applications of retrieved three dimensional liquid water fields
NASA Technical Reports Server (NTRS)
Raymond, William H.; Olson, William S.
1990-01-01
Forecasts are made for tropical storm Emily using heating rates derived from the SSM/I physical retrievals described in chapters 2 and 3. Average values of the latent heating rates from the convective and stratiform cloud simulations, used in the physical retrieval, are obtained for individual 1.1 km thick vertical layers. Then, the layer-mean latent heating rates are regressed against the slant path-integrated liquid and ice precipitation water contents to determine the best fit two parameter regression coefficients for each layer. The regression formulae and retrieved precipitation water contents are utilized to infer the vertical distribution of heating rates for forecast model applications. In the forecast model, diabatic temperature contributions are calculated and used in a diabatic initialization, or in a diabatic initialization combined with a diabatic forcing procedure. Our forecasts show that the time needed to spin-up precipitation processes in tropical storm Emily is greatly accelerated through the application of the data.
Local field potential spectral tuning in motor cortex during reaching.
Heldman, Dustin A; Wang, Wei; Chan, Sherwin S; Moran, Daniel W
2006-06-01
In this paper, intracortical local field potentials (LFPs) and single units were recorded from the motor cortices of monkeys (Macaca fascicularis) while they preformed a standard three-dimensional (3-D) center-out reaching task. During the center-out task, the subjects held their hands at the location of a central target and then reached to one of eight peripheral targets forming the corners of a virtual cube. The spectral amplitudes of the recorded LFPs were calculated, with the high-frequency LFP (HF-LFP) defined as the average spectral amplitude change from baseline from 60 to 200 Hz. A 3-D linear regression across the eight center-out targets revealed that approximately 6% of the beta LFPs (18-26 Hz) and 18% of the HF-LFPs were tuned for velocity (p-value < 0.05), while 10% of the beta LFPs and 15% of the HF-LFPs were tuned for position. These results suggest that a multidegree-of-freedom brain-machine interface is possible using high-frequency LFP recordings in motor cortex.
Biomechanical evaluation of nursing tasks in a hospital setting.
Jang, R; Karwowski, W; Quesada, P M; Rodrick, D; Sherehiy, B; Cronin, S N; Layer, J K
2007-11-01
A field study was conducted to investigate spinal kinematics and loading in the nursing profession using objective and subjective measurements of selected nursing tasks observed in a hospital setting. Spinal loading was estimated using trunk motion dynamics measured by the lumbar motion monitor (LMM) and lower back compressive and shear forces were estimated using the three-dimensional (3D) Static Strength Prediction Program. Subjective measures included the rate of perceived physical effort and the perceived risk of low back pain. A multiple logistic regression model, reported in the literature for predicting low back injury based on defined risk groups, was tested. The study results concluded that the major risk factors for low back injury in nurses were the weight of patients handled, trunk moment, and trunk axial rotation. The activities that required long time exposure to awkward postures were perceived by nurses as a high physical effort. This study also concluded that self-reported perceived exertion could be used as a tool to identify nursing activities with a high risk of low-back injury.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel
Grapov, Dmitry; Newman, John W.
2012-01-01
Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358
Estimating the proportion of true null hypotheses when the statistics are discrete.
Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S
2015-07-15
In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Fernandez-Sainz, A.; García-Merino, J. D.; Urionabarrenetxea, S.
2016-01-01
This paper seeks to discover whether the performance of university students has improved in the wake of the changes in higher education introduced by the Bologna Declaration of 1999 and the construction of the European Higher Education Area. A principal component analysis is used to construct a multi-dimensional performance variable called the…
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m − 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m − 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper. PMID:25874246
Experimental ladder proof of Hardy's nonlocality for high-dimensional quantum systems
NASA Astrophysics Data System (ADS)
Chen, Lixiang; Zhang, Wuhong; Wu, Ziwen; Wang, Jikang; Fickler, Robert; Karimi, Ebrahim
2017-08-01
Recent years have witnessed a rapidly growing interest in high-dimensional quantum entanglement for fundamental studies as well as towards novel applications. Therefore, the ability to verify entanglement between physical qudits, d -dimensional quantum systems, is of crucial importance. To show nonclassicality, Hardy's paradox represents "the best version of Bell's theorem" without using inequalities. However, so far it has only been tested experimentally for bidimensional vector spaces. Here, we formulate a theoretical framework to demonstrate the ladder proof of Hardy's paradox for arbitrary high-dimensional systems. Furthermore, we experimentally demonstrate the ladder proof by taking advantage of the orbital angular momentum of high-dimensionally entangled photon pairs. We perform the ladder proof of Hardy's paradox for dimensions 3 and 4, both with the ladder up to the third step. Our paper paves the way towards a deeper understanding of the nature of high-dimensionally entangled quantum states and may find applications in quantum information science.
From Airborne EM to Geology, some examples
NASA Astrophysics Data System (ADS)
Gunnink, Jan
2014-05-01
Introduction Airborne Electro Magnetics (AEM) provide a model of the 3-dimensional distribution of resistivity of the subsurface. These resistivity models were used for delineating geological structures (e.g. Buried Valleys and salt domes) and for geohydrological modeling of aquifers (sandy sediments) and aquitards (clayey sediments). Most of the interpretation of the AEM has been carried out manually, by interpretation of 2 and 3-dimensional resistivity models into geological units by a skilled geologists / geophysicist. The manual interpretation is tiresome, takes a long time and is prone to subjective choices of the interpreter. Therefore, semi-automatic interpretation of AEM resistivity models into geological units is a recent research topic. Two examples are presented that show how resistivity, as obtained from AEM, can be "converted" to useful geological / geohydrolocal models. Statistical relation between borehole data and resistivity In the northeastern part of the Netherlands, the 3D distribution of clay deposits - formed in a glacio-lacustrine environment with buried glacial valleys - was modelled. Boreholes with description of lithology, were linked to AEM resistivity. First, 1D AEM resistivity models from each individual sounding were interpolated to cover the entire study area, resulting in a 3-dimensional model of resistivity. For each interval of clay and sand in the boreholes, the corresponding resistivity was extracted from the 3D resistivity model. Linear regression was used to link the clay and non-clay proportion in each borehole interval to the Ln(resistivity). This regression is then used to "convert" the 3D resistivity model into proportion of clay for the entire study area. This so-called "soft information" is combined with the "hard data" (boreholes) to model the proportion of clay for the entire study area using geostatistical simulation techniques (Sequential Indicator Simulation with collocated co-kriging). 100 realizations of the 3-dimensional distribution of clay and sand were calculated giving an appreciation of the variability of the 3-dimensional distribution of clay and sand. Each realization was input into a groundwatermodel to assess the protection the of the clay against pollution from the surface. Artificial Neural Networks AEM resistivity models in an area in Northern part of the Netherlands were interpreted by Artificial Neural Networks (ANN) to obtain a 3-dimensional model of a glacial till deposit that is important in geohydrological modeling. The groundwater in the study area was brackish to saline, causing the AEM resistivity model to be dominated by the low resistivity of the groundwater. After conducting Electrical Cone Penetration Tests (ECPTs) it became clear that the glacial till showed a distinct, non-linear, pattern of resistivity, that was discriminating it from the surrounding sediments. The patterns, found in the ECPTs were used to train an ANN and was consequently applied to the resistivity model that was derived from the AEM. The result was a 3-dimensional model of the probability of having the glacial till, which was checked against boreholes and proved to be quite reasonable. Conclusion Resistivity derived from AEM can be linked to geological features in a number of ways. Besides manual interpretation, statistical techniques are used, either in the form of regression or by means of Neural Networks, to extract geological and geohydrological meaningful interpretations from the resistivity model.
Fu, Kin Chung Denny; Dalla Libera, Fabio; Ishiguro, Hiroshi
2015-10-08
In the field of human motor control, the motor synergy hypothesis explains how humans simplify body control dimensionality by coordinating groups of muscles, called motor synergies, instead of controlling muscles independently. In most applications of motor synergies to low-dimensional control in robotics, motor synergies are extracted from given optimal control signals. In this paper, we address the problems of how to extract motor synergies without optimal data given, and how to apply motor synergies to achieve low-dimensional task-space tracking control of a human-like robotic arm actuated by redundant muscles, without prior knowledge of the robot. We propose to extract motor synergies from a subset of randomly generated reaching-like movement data. The essence is to first approximate the corresponding optimal control signals, using estimations of the robot's forward dynamics, and to extract the motor synergies subsequently. In order to avoid modeling difficulties, a learning-based control approach is adopted such that control is accomplished via estimations of the robot's inverse dynamics. We present a kernel-based regression formulation to estimate the forward and the inverse dynamics, and a sliding controller in order to cope with estimation error. Numerical evaluations show that the proposed method enables extraction of motor synergies for low-dimensional task-space control.
Engineering two-photon high-dimensional states through quantum interference
Zhang, Yingwen; Roux, Filippus S.; Konrad, Thomas; Agnew, Megan; Leach, Jonathan; Forbes, Andrew
2016-01-01
Many protocols in quantum science, for example, linear optical quantum computing, require access to large-scale entangled quantum states. Such systems can be realized through many-particle qubits, but this approach often suffers from scalability problems. An alternative strategy is to consider a lesser number of particles that exist in high-dimensional states. The spatial modes of light are one such candidate that provides access to high-dimensional quantum states, and thus they increase the storage and processing potential of quantum information systems. We demonstrate the controlled engineering of two-photon high-dimensional states entangled in their orbital angular momentum through Hong-Ou-Mandel interference. We prepare a large range of high-dimensional entangled states and implement precise quantum state filtering. We characterize the full quantum state before and after the filter, and are thus able to determine that only the antisymmetric component of the initial state remains. This work paves the way for high-dimensional processing and communication of multiphoton quantum states, for example, in teleportation beyond qubits. PMID:26933685
A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.
Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun
2017-09-21
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A
2014-01-01
Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).
Comparison of three-dimensional multi-segmental foot models used in clinical gait laboratories.
Nicholson, Kristen; Church, Chris; Takata, Colton; Niiler, Tim; Chen, Brian Po-Jung; Lennon, Nancy; Sees, Julie P; Henley, John; Miller, Freeman
2018-05-16
Many skin-mounted three-dimensional multi-segmented foot models are currently in use for gait analysis. Evidence regarding the repeatability of models, including between trial and between assessors, is mixed, and there are no between model comparisons of kinematic results. This study explores differences in kinematics and repeatability between five three-dimensional multi-segmented foot models. The five models include duPont, Heidelberg, Oxford Child, Leardini, and Utah. Hind foot, forefoot, and hallux angles were calculated with each model for ten individuals. Two physical therapists applied markers three times to each individual to assess within and between therapist variability. Standard deviations were used to evaluate marker placement variability. Locally weighted regression smoothing with alpha-adjusted serial T tests analysis was used to assess kinematic similarities. All five models had similar variability, however, the Leardini model showed high standard deviations in plantarflexion/dorsiflexion angles. P-value curves for the gait cycle were used to assess kinematic similarities. The duPont and Oxford models had the most similar kinematics. All models demonstrated similar marker placement variability. Lower variability was noted in the sagittal and coronal planes compared to rotation in the transverse plane, suggesting a higher minimal detectable change when clinically considering rotation and a need for additional research. Between the five models, the duPont and Oxford shared the most kinematic similarities. While patterns of movement were very similar between all models, offsets were often present and need to be considered when evaluating published data. Copyright © 2018 Elsevier B.V. All rights reserved.
Gonda, Tomoya; Yasuda, Daiisa; Ikebe, Kazunori; Maeda, Yoshinobu
2014-01-01
Although the risks of using a cantilever to treat missing teeth have been described, the mechanisms remain unclear. This study aimed to reveal these mechanisms from a biomechanical perspective. The effects of various implant sites, number of implants, and superstructural connections on stress distribution in the marginal bone were analyzed with three-dimensional finite element models based on mandibular computed tomography data. Forces from the masseter, temporalis, and internal pterygoid were applied as vectors. Two three-dimensional finite element models were created with the edentulous mandible showing severe and relatively modest residual ridge resorption. Cantilevers of the premolar and molar were simulated in the superstructures in the models. The following conditions were also included as factors in the models to investigate changes: poor bone quality, shortened dental arch, posterior occlusion, lateral occlusion, double force of the masseter, and short implant. Multiple linear regression analysis with a forced-entry method was performed with stress values as the objective variable and the factors as the explanatory variable. When bone mass was high, stress around the implant caused by differences in implantation sites was reduced. When bone mass was low, the presence of a cantilever was a possible risk factor. The stress around the implant increased significantly if bone quality was poor or if increased force (eg, bruxism) was applied. The addition of a cantilever to the superstructure increased stress around implants. When large muscle forces were applied to a superstructure with cantilevers or if bone quality was poor, stress around the implants increased.
Particle shape effect on erosion of optical glass substrates due to microparticles
NASA Astrophysics Data System (ADS)
Waxman, Rachel; Gray, Perry; Guven, Ibrahim
2018-03-01
Impact experiments using sand particles and soda lime glass spheres were performed on four distinct glass substrates. Sand particles were characterized using optical and scanning electron microscopy. High-speed video footage from impact tests was used to calculate incoming and rebound velocities of the individual impact events, as well as the particle volume and two-dimensional sphericity. Furthermore, video analysis was used in conjunction with optical and scanning electron microscopy to relate the incoming velocity and particle shape to subsequent fractures, including both radial and lateral cracks. Indentation theory [Marshall et al., J. Am. Ceram. Soc. 65, 561-566 (1982)] was applied and correlated with lateral crack lengths. Multi-variable power law regression was performed, incorporating the particle shape into the model and was shown to have better fit to damage data than the previous indentation model.
NASA Astrophysics Data System (ADS)
Tikhonov, Mikhail; Monasson, Remi
2018-01-01
Much of our understanding of ecological and evolutionary mechanisms derives from analysis of low-dimensional models: with few interacting species, or few axes defining "fitness". It is not always clear to what extent the intuition derived from low-dimensional models applies to the complex, high-dimensional reality. For instance, most naturally occurring microbial communities are strikingly diverse, harboring a large number of coexisting species, each of which contributes to shaping the environment of others. Understanding the eco-evolutionary interplay in these systems is an important challenge, and an exciting new domain for statistical physics. Recent work identified a promising new platform for investigating highly diverse ecosystems, based on the classic resource competition model of MacArthur. Here, we describe how the same analytical framework can be used to study evolutionary questions. Our analysis illustrates how, at high dimension, the intuition promoted by a one-dimensional (scalar) notion of fitness can become misleading. Specifically, while the low-dimensional picture emphasizes organism cost or efficiency, we exhibit a regime where cost becomes irrelevant for survival, and link this observation to generic properties of high-dimensional geometry.
NASA Astrophysics Data System (ADS)
Ma, Yun-Ming; Wang, Tie-Jun
2017-10-01
Higher-dimensional quantum system is of great interest owing to the outstanding features exhibited in the implementation of novel fundamental tests of nature and application in various quantum information tasks. High-dimensional quantum logic gate is a key element in scalable quantum computation and quantum communication. In this paper, we propose a scheme to implement a controlled-phase gate between a 2 N -dimensional photon and N three-level artificial atoms. This high-dimensional controlled-phase gate can serve as crucial components of the high-capacity, long-distance quantum communication. We use the high-dimensional Bell state analysis as an example to show the application of this device. Estimates on the system requirements indicate that our protocol is realizable with existing or near-further technologies. This scheme is ideally suited to solid-state integrated optical approaches to quantum information processing, and it can be applied to various system, such as superconducting qubits coupled to a resonator or nitrogen-vacancy centers coupled to a photonic-band-gap structures.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data.
Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k -nearest neighbor graphs- ( K -NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data
Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity. PMID:29270197
Harnessing high-dimensional hyperentanglement through a biphoton frequency comb
NASA Astrophysics Data System (ADS)
Xie, Zhenda; Zhong, Tian; Shrestha, Sajan; Xu, Xinan; Liang, Junlin; Gong, Yan-Xiao; Bienfang, Joshua C.; Restelli, Alessandro; Shapiro, Jeffrey H.; Wong, Franco N. C.; Wei Wong, Chee
2015-08-01
Quantum entanglement is a fundamental resource for secure information processing and communications, and hyperentanglement or high-dimensional entanglement has been separately proposed for its high data capacity and error resilience. The continuous-variable nature of the energy-time entanglement makes it an ideal candidate for efficient high-dimensional coding with minimal limitations. Here, we demonstrate the first simultaneous high-dimensional hyperentanglement using a biphoton frequency comb to harness the full potential in both the energy and time domain. Long-postulated Hong-Ou-Mandel quantum revival is exhibited, with up to 19 time-bins and 96.5% visibilities. We further witness the high-dimensional energy-time entanglement through Franson revivals, observed periodically at integer time-bins, with 97.8% visibility. This qudit state is observed to simultaneously violate the generalized Bell inequality by up to 10.95 standard deviations while observing recurrent Clauser-Horne-Shimony-Holt S-parameters up to 2.76. Our biphoton frequency comb provides a platform for photon-efficient quantum communications towards the ultimate channel capacity through energy-time-polarization high-dimensional encoding.
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions
NASA Astrophysics Data System (ADS)
Chen, Nan; Majda, Andrew J.
2018-02-01
Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
Bedload and Total Load Sediment Transport Equations for Rough Open-Channel Flow
NASA Astrophysics Data System (ADS)
Abrahams, A. D.; Gao, P.
2001-12-01
The total sediment load transported by an open-channel flow may be divided into bedload and suspended load. Bedload transport occurs by saltation at low shear stress and by sheetflow at high shear stress. Dimensional analysis is used to identify the dimensionless variables that control the transport rate of noncohesive sediments over a plane bed, and regression analysis is employed to isolate the significant variables and determine the values of the coefficients. In the general bedload transport equation (i.e. for saltation and sheetflow) the dimensionless bedload transport rate is a function of the dimensionless shear stress, the friction factor, and an efficiency coefficient. For sheetflow the last term approaches 1, so that the bedload transport rate becomes a function of just the dimensionless shear stress and the friction factor. The dimensional analysis indicates that the dimensionless total load transport rate is a function of the dimensionless bedload transport rate and the dimensionless settling velocity of the sediment. Predicted values of the transport rates are graphed against the computed values of these variables for 505 flume experiments reported in the literature. These graphs indicate that the equations developed in this study give good unbiased predictions of both the bedload transport rate and total load transport rate over a wide range of conditions.
Xie, Li-Ting; Yan, Chun-Hong; Zhao, Qi-Yu; He, Meng-Na; Jiang, Tian-An
2018-01-01
Two-dimensional shear wave elastography (2D-SWE) is a rapid, simple and novel noninvasive method that has been proposed for assessing hepatic fibrosis in patients with chronic liver diseases (CLDs) based on measurements of liver stiffness. 2D-SWE can be performed easily at the bedside or in an outpatient clinic and yields immediate results with good reproducibility. Furthermore, 2D-SWE was an efficient method for evaluating liver fibrosis in small to moderately sized clinical trials. However, the quality criteria for the staging of liver fibrosis are not yet well defined. Liver fibrosis is the main pathological basis of liver stiffness and a key step in the progression from CLD to cirrhosis; thus, the management of CLD largely depends on the extent and progression of liver fibrosis. 2D-SWE appears to be an excellent tool for the early detection of cirrhosis and may have prognostic value in this context. Because 2D-SWE has high patient acceptance, it could be useful for monitoring fibrosis progression and regression in individual cases. However, multicenter data are needed to support its use. This study reviews the current status and future perspectives of 2D-SWE for assessments of liver fibrosis and discusses the technical advantages and limitations that impact its effective and rational clinical use. PMID:29531460
A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty
Friedel, Michael J.
2011-01-01
This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.
Ghanbari, Yasser; Smith, Alex R.; Schultz, Robert T.; Verma, Ragini
2014-01-01
Diffusion tensor imaging (DTI) offers rich insights into the physical characteristics of white matter (WM) fiber tracts and their development in the brain, facilitating a network representation of brain’s traffic pathways. Such a network representation of brain connectivity has provided a novel means of investigating brain changes arising from pathology, development or aging. The high dimensionality of these connectivity networks necessitates the development of methods that identify the connectivity building blocks or sub-network components that characterize the underlying variation in the population. In addition, the projection of the subject networks into the basis set provides a low dimensional representation of it, that teases apart different sources of variation in the sample, facilitating variation-specific statistical analysis. We propose a unified framework of non-negative matrix factorization and graph embedding for learning sub-network patterns of connectivity by their projective non-negative decomposition into a reconstructive basis set, as well as, additional basis sets representing variational sources in the population like age and pathology. The proposed framework is applied to a study of diffusion-based connectivity in subjects with autism that shows localized sparse sub-networks which mostly capture the changes related to pathology and developmental variations. PMID:25037933
Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals
Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.
2018-03-20
A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
Crevillén-García, D
2018-04-01
Time-consuming numerical simulators for solving groundwater flow and dissolution models of physico-chemical processes in deep aquifers normally require some of the model inputs to be defined in high-dimensional spaces in order to return realistic results. Sometimes, the outputs of interest are spatial fields leading to high-dimensional output spaces. Although Gaussian process emulation has been satisfactorily used for computing faithful and inexpensive approximations of complex simulators, these have been mostly applied to problems defined in low-dimensional input spaces. In this paper, we propose a method for simultaneously reducing the dimensionality of very high-dimensional input and output spaces in Gaussian process emulators for stochastic partial differential equation models while retaining the qualitative features of the original models. This allows us to build a surrogate model for the prediction of spatial fields in such time-consuming simulators. We apply the methodology to a model of convection and dissolution processes occurring during carbon capture and storage.
Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.
A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
NASA Astrophysics Data System (ADS)
Liu, Changying; Wu, Xinyuan
2017-07-01
In this paper we explore arbitrarily high-order Lagrange collocation-type time-stepping schemes for effectively solving high-dimensional nonlinear Klein-Gordon equations with different boundary conditions. We begin with one-dimensional periodic boundary problems and first formulate an abstract ordinary differential equation (ODE) on a suitable infinity-dimensional function space based on the operator spectrum theory. We then introduce an operator-variation-of-constants formula which is essential for the derivation of our arbitrarily high-order Lagrange collocation-type time-stepping schemes for the nonlinear abstract ODE. The nonlinear stability and convergence are rigorously analysed once the spatial differential operator is approximated by an appropriate positive semi-definite matrix under some suitable smoothness assumptions. With regard to the two dimensional Dirichlet or Neumann boundary problems, our new time-stepping schemes coupled with discrete Fast Sine / Cosine Transformation can be applied to simulate the two-dimensional nonlinear Klein-Gordon equations effectively. All essential features of the methodology are present in one-dimensional and two-dimensional cases, although the schemes to be analysed lend themselves with equal to higher-dimensional case. The numerical simulation is implemented and the numerical results clearly demonstrate the advantage and effectiveness of our new schemes in comparison with the existing numerical methods for solving nonlinear Klein-Gordon equations in the literature.
Efficient differentially private learning improves drug sensitivity prediction.
Honkela, Antti; Das, Mrinal; Nieminen, Arttu; Dikmen, Onur; Kaski, Samuel
2018-02-06
Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities. We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy. The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields. This article was reviewed by Zoltan Gaspari and David Kreil.
Medvedofsky, Diego; Addetia, Karima; Patel, Amit R; Sedlmeier, Anke; Baumann, Rolf; Mor-Avi, Victor; Lang, Roberto M
2015-10-01
Echocardiographic assessment of the right ventricle is difficult because of its complex shape. Three-dimensional echocardiographic (3DE) imaging allows more accurate and reproducible analysis of the right ventricle than two-dimensional methodology. However, three-dimensional volumetric analysis has been hampered by difficulties obtaining consistently high-quality coronal views, required by the existing software packages. The aim of this study was to test a new approach for volumetric analysis without coronal views by using instead right ventricle-focused three-dimensional acquisition with multiple short-axis views extracted from the same data set. Transthoracic 3DE and cardiovascular magnetic resonance (CMR) images were prospectively obtained on the same day in 147 patients with wide ranges of right ventricular (RV) size and function. RV volumes and ejection fraction were measured from 3DE images using the new software and compared with CMR reference values. Comparisons included linear regression and Bland-Altman analyses. Repeated measurements were performed to assess measurement variability. Sixteen patients were excluded because of suboptimal image quality (89% feasibility). RV volumes and ejection fraction obtained with the new 3DE technique were in good agreement with CMR (end-diastolic volume, r = 0.95; end-systolic volume, r = 0.96; ejection fraction, r = 0.83). Biases were, respectively, -6 ± 11%, 0 ± 15%, and -7 ± 17% of the mean measured values. In a subset of patients with suboptimal 3DE images, the new analysis resulted in significantly improved accuracy against CMR and reproducibility, compared with previously used coronal view-based techniques. The time required for the 3DE analysis was approximately 4 min. The new software is fast, reproducible, and accurate compared with CMR over a wide range of RV size and function. Because right ventricle-focused 3DE acquisition is feasible in most patients, this approach may be applicable to a broader population of patients who can benefit from RV volumetric assessment. Copyright © 2015 American Society of Echocardiography. Published by Elsevier Inc. All rights reserved.
DD-HDS: A method for visualization and exploration of high-dimensional data.
Lespinats, Sylvain; Verleysen, Michel; Giron, Alain; Fertil, Bernard
2007-09-01
Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This paper presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. It improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces (1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and (2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to high-dimensional data and the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction methods.
Manifold learning in machine vision and robotics
NASA Astrophysics Data System (ADS)
Bernstein, Alexander
2017-02-01
Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.
NASA Astrophysics Data System (ADS)
Lorenzi, Juan M.; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-01
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Lorenzi, Juan M; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-28
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla
2014-03-01
Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
Knopman, Debra S.; Voss, Clifford I.
1988-01-01
Sensitivities of solute concentration to parameters associated with first-order chemical decay, boundary conditions, initial conditions, and multilayer transport are examined in one-dimensional analytical models of transient solute transport in porous media. A sensitivity is a change in solute concentration resulting from a change in a model parameter. Sensitivity analysis is important because minimum information required in regression on chemical data for the estimation of model parameters by regression is expressed in terms of sensitivities. Nonlinear regression models of solute transport were tested on sets of noiseless observations from known models that exceeded the minimum sensitivity information requirements. Results demonstrate that the regression models consistently converged to the correct parameters when the initial sets of parameter values substantially deviated from the correct parameters. On the basis of the sensitivity analysis, several statements may be made about design of sampling for parameter estimation for the models examined: (1) estimation of parameters associated with solute transport in the individual layers of a multilayer system is possible even when solute concentrations in the individual layers are mixed in an observation well; (2) when estimating parameters in a decaying upstream boundary condition, observations are best made late in the passage of the front near a time chosen by adding the inverse of an hypothesized value of the source decay parameter to the estimated mean travel time at a given downstream location; (3) estimation of a first-order chemical decay parameter requires observations to be made late in the passage of the front, preferably near a location corresponding to a travel time of √2 times the half-life of the solute; and (4) estimation of a parameter relating to spatial variability in an initial condition requires observations to be made early in time relative to passage of the solute front.
Asselbergs, Joost; Ruwaard, Jeroen; Ejdys, Michal; Schrader, Niels; Sijbrandij, Marit; Riper, Heleen
2016-03-29
Ecological momentary assessment (EMA) is a useful method to tap the dynamics of psychological and behavioral phenomena in real-world contexts. However, the response burden of (self-report) EMA limits its clinical utility. The aim was to explore mobile phone-based unobtrusive EMA, in which mobile phone usage logs are considered as proxy measures of clinically relevant user states and contexts. This was an uncontrolled explorative pilot study. Our study consisted of 6 weeks of EMA/unobtrusive EMA data collection in a Dutch student population (N=33), followed by a regression modeling analysis. Participants self-monitored their mood on their mobile phone (EMA) with a one-dimensional mood measure (1 to 10) and a two-dimensional circumplex measure (arousal/valence, -2 to 2). Meanwhile, with participants' consent, a mobile phone app unobtrusively collected (meta) data from six smartphone sensor logs (unobtrusive EMA: calls/short message service (SMS) text messages, screen time, application usage, accelerometer, and phone camera events). Through forward stepwise regression (FSR), we built personalized regression models from the unobtrusive EMA variables to predict day-to-day variation in EMA mood ratings. The predictive performance of these models (ie, cross-validated mean squared error and percentage of correct predictions) was compared to naive benchmark regression models (the mean model and a lag-2 history model). A total of 27 participants (81%) provided a mean 35.5 days (SD 3.8) of valid EMA/unobtrusive EMA data. The FSR models accurately predicted 55% to 76% of EMA mood scores. However, the predictive performance of these models was significantly inferior to that of naive benchmark models. Mobile phone-based unobtrusive EMA is a technically feasible and potentially powerful EMA variant. The method is young and positive findings may not replicate. At present, we do not recommend the application of FSR-based mood prediction in real-world clinical settings. Further psychometric studies and more advanced data mining techniques are needed to unlock unobtrusive EMA's true potential.
Wang, Shan-Shan; Zhang, Yu-Qi; Chen, Shu-Bao; Huang, Guo-Ying; Zhang, Hong-Yan; Zhang, Zhi-Fang; Wu, Lan-Ping; Hong, Wen-Jing; Shen, Rong; Liu, Yi-Qing; Zhu, Jun-Xue
2017-06-01
Clinical decision making in children with congenital and acquired heart disease relies on measurements of cardiac structures using two-dimensional echocardiography. We aimed to establish z-score regression equations for right heart structures in healthy Chinese Han children. Two-dimensional and M-mode echocardiography was performed in 515 patients. We measured the dimensions of the pulmonary valve annulus (PVA), main pulmonary artery (MPA), left pulmonary artery (LPA), right pulmonary artery (RPA), right ventricular outflow tract at end-diastole (RVOTd) and at end-systole (RVOTs), tricuspid valve annulus (TVA), right ventricular inflow tract at end-diastole (RVIDd) and at end-systole (RVIDs), and right atrium (RA). Regression analyses were conducted to relate the measurements of right heart structures to 4body surface area (BSA). Right ventricular outflow-tract fractional shortening (RVOTFS) was also calculated. Several models were used, and the best model was chosen to establish a z-score calculator. PVA, MPA, LPA, RPA, RVOTd, RVOTs, TVA, RVIDd, RVIDs, and RA (R 2 = 0.786, 0.705, 0.728, 0.701, 0.706, 0.824, 0.804, 0.663, 0.626, and 0.793, respectively) had a cubic polynomial relationship with BSA; specifically, measurement (M) = β0 + β1 × BSA + β2 × BSA 2 + β3 × BSA. 3 RVOTFS (0.28 ± 0.02) fell within a narrow range (0.12-0.51). Our results provide reference values for z scores and regression equations for right heart structures in Han Chinese children. These data may help interpreting the routine clinical measurement of right heart structures in children with congenital or acquired heart disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 45:293-303, 2017. © 2017 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Rohmer, Jeremy
2016-04-01
Predicting the temporal evolution of landslides is typically supported by numerical modelling. Dynamic sensitivity analysis aims at assessing the influence of the landslide properties on the time-dependent predictions (e.g., time series of landslide displacements). Yet two major difficulties arise: 1. Global sensitivity analysis require running the landslide model a high number of times (> 1000), which may become impracticable when the landslide model has a high computation time cost (> several hours); 2. Landslide model outputs are not scalar, but function of time, i.e. they are n-dimensional vectors with n usually ranging from 100 to 1000. In this article, I explore the use of a basis set expansion, such as principal component analysis, to reduce the output dimensionality to a few components, each of them being interpreted as a dominant mode of variation in the overall structure of the temporal evolution. The computationally intensive calculation of the Sobol' indices for each of these components are then achieved through meta-modelling, i.e. by replacing the landslide model by a "costless-to-evaluate" approximation (e.g., a projection pursuit regression model). The methodology combining "basis set expansion - meta-model - Sobol' indices" is then applied to the La Frasse landslide to investigate the dynamic sensitivity analysis of the surface horizontal displacements to the slip surface properties during the pore pressure changes. I show how to extract information on the sensitivity of each main modes of temporal behaviour using a limited number (a few tens) of long running simulations. In particular, I identify the parameters, which trigger the occurrence of a turning point marking a shift between a regime of low values of landslide displacements and one of high values.
Kehl, Sven; Eckert, Sven; Sütterlin, Marc; Neff, K Wolfgang; Siemer, Jörn
2011-06-01
Three-dimensional (3D) sonographic volumetry is established in gynecology and obstetrics. Assessment of the fetal lung volume by magnetic resonance imaging (MRI) in congenital diaphragmatic hernias has become a routine examination. In vitro studies have shown a good correlation between 3D sonographic measurements and MRI. The aim of this study was to compare the lung volumes of healthy fetuses assessed by 3D sonography to MRI measurements and to investigate the impact of different rotation angles. A total of 126 fetuses between 20 and 40 weeks' gestation were measured by 3D sonography, and 27 of them were also assessed by MRI. The sonographic volumes were calculated by the rotational technique (virtual organ computer-aided analysis) with rotation angles of 6° and 30°. To evaluate the accuracy of 3D sonographic volumetry, percentage error and absolute percentage error values were calculated using MRI volumes as reference points. Formulas to calculate total, right, and left fetal lung volumes according to gestational age and biometric parameters were derived by stepwise regression analysis. Three-dimensional sonographic volumetry showed a high correlation compared to MRI (6° angle, R(2) = 0.971; 30° angle, R(2) = 0.917) with no systematic error for the 6° angle. Moreover, using the 6° rotation angle, the median absolute percentage error was significantly lower compared to the 30° angle (P < .001). The new formulas to calculate total lung volume in healthy fetuses only included gestational age and no biometric parameters (R(2) = 0.853). Three-dimensional sonographic volumetry of lung volumes in healthy fetuses showed a good correlation with MRI. We recommend using an angle of 6° because it assessed the lung volume more accurately. The specifically designed equations help estimate lung volumes in healthy fetuses.
The application of dimensional analysis to the problem of solar wind-magnetosphere energy coupling
NASA Technical Reports Server (NTRS)
Bargatze, L. F.; Mcpherron, R. L.; Baker, D. N.; Hones, E. W., Jr.
1984-01-01
The constraints imposed by dimensional analysis are used to find how the solar wind-magnetosphere energy transfer rate depends upon interplanetary parameters. The analyses assume that only magnetohydrodynamic processes are important in controlling the rate of energy transfer. The study utilizes ISEE-3 solar wind observations, the AE index, and UT from three 10-day intervals during the International Magnetospheric Study. Simple linear regression and histogram techniques are used to find the value of the magnetohydrodynamic coupling exponent, alpha, which is consistent with observations of magnetospheric response. Once alpha is estimated, the form of the solar wind energy transfer rate is obtained by substitution into an equation of the interplanetary variables whose exponents depend upon alpha.
Discrete Emotion Effects on Lexical Decision Response Times
Briesemeister, Benny B.; Kuchinke, Lars; Jacobs, Arthur M.
2011-01-01
Our knowledge about affective processes, especially concerning effects on cognitive demands like word processing, is increasing steadily. Several studies consistently document valence and arousal effects, and although there is some debate on possible interactions and different notions of valence, broad agreement on a two dimensional model of affective space has been achieved. Alternative models like the discrete emotion theory have received little interest in word recognition research so far. Using backward elimination and multiple regression analyses, we show that five discrete emotions (i.e., happiness, disgust, fear, anger and sadness) explain as much variance as two published dimensional models assuming continuous or categorical valence, with the variables happiness, disgust and fear significantly contributing to this account. Moreover, these effects even persist in an experiment with discrete emotion conditions when the stimuli are controlled for emotional valence and arousal levels. We interpret this result as evidence for discrete emotion effects in visual word recognition that cannot be explained by the two dimensional affective space account. PMID:21887307
Discrete emotion effects on lexical decision response times.
Briesemeister, Benny B; Kuchinke, Lars; Jacobs, Arthur M
2011-01-01
Our knowledge about affective processes, especially concerning effects on cognitive demands like word processing, is increasing steadily. Several studies consistently document valence and arousal effects, and although there is some debate on possible interactions and different notions of valence, broad agreement on a two dimensional model of affective space has been achieved. Alternative models like the discrete emotion theory have received little interest in word recognition research so far. Using backward elimination and multiple regression analyses, we show that five discrete emotions (i.e., happiness, disgust, fear, anger and sadness) explain as much variance as two published dimensional models assuming continuous or categorical valence, with the variables happiness, disgust and fear significantly contributing to this account. Moreover, these effects even persist in an experiment with discrete emotion conditions when the stimuli are controlled for emotional valence and arousal levels. We interpret this result as evidence for discrete emotion effects in visual word recognition that cannot be explained by the two dimensional affective space account.
Xu, Haoming; Moni, Mohammad Ali; Liò, Pietro
2015-12-01
In cancer genomics, gene expression levels provide important molecular signatures for all types of cancer, and this could be very useful for predicting the survival of cancer patients. However, the main challenge of gene expression data analysis is high dimensionality, and microarray is characterised by few number of samples with large number of genes. To overcome this problem, a variety of penalised Cox proportional hazard models have been proposed. We introduce a novel network regularised Cox proportional hazard model and a novel multiplex network model to measure the disease comorbidities and to predict survival of the cancer patient. Our methods are applied to analyse seven microarray cancer gene expression datasets: breast cancer, ovarian cancer, lung cancer, liver cancer, renal cancer and osteosarcoma. Firstly, we applied a principal component analysis to reduce the dimensionality of original gene expression data. Secondly, we applied a network regularised Cox regression model on the reduced gene expression datasets. By using normalised mutual information method and multiplex network model, we predict the comorbidities for the liver cancer based on the integration of diverse set of omics and clinical data, and we find the diseasome associations (disease-gene association) among different cancers based on the identified common significant genes. Finally, we evaluated the precision of the approach with respect to the accuracy of survival prediction using ROC curves. We report that colon cancer, liver cancer and renal cancer share the CXCL5 gene, and breast cancer, ovarian cancer and renal cancer share the CCND2 gene. Our methods are useful to predict survival of the patient and disease comorbidities more accurately and helpful for improvement of the care of patients with comorbidity. Software in Matlab and R is available on our GitHub page: https://github.com/ssnhcom/NetworkRegularisedCox.git. Copyright © 2015. Published by Elsevier Ltd.
Tuschy, Benjamin; Berlit, Sebastian; Brade, Joachim; Sütterlin, Marc; Hornemann, Amadeus
2014-01-01
To investigate the clinical assessment of a full high-definition (HD) three-dimensional robot-assisted laparoscopic device in gynaecological surgery. This study included 70 women who underwent gynaecological laparoscopic procedures. Demographic parameters, type and duration of surgery and perioperative complications were analyzed. Fifteen surgeons were postoperatively interviewed regarding their assessment of this new system with a standardized questionnaire. The clinical assessment revealed that three-dimensional full-HD visualisation is comfortable and improves spatial orientation and hand-to-eye coordination. The majority of the surgeons stated they would prefer a three-dimensional system to a conventional two-dimensional device and stated that the robotic camera arm led to more relaxed working conditions. Three-dimensional laparoscopy is feasible, comfortable and well-accepted in daily routine. The three-dimensional visualisation improves surgeons' hand-to-eye coordination, intracorporeal suturing and fine dissection. The combination of full-HD three-dimensional visualisation with the robotic camera arm results in very high image quality and stability.
Gómez-Carracedo, M P; Andrade, J M; Rutledge, D N; Faber, N M
2007-03-07
Selecting the correct dimensionality is critical for obtaining partial least squares (PLS) regression models with good predictive ability. Although calibration and validation sets are best established using experimental designs, industrial laboratories cannot afford such an approach. Typically, samples are collected in an (formally) undesigned way, spread over time and their measurements are included in routine measurement processes. This makes it hard to evaluate PLS model dimensionality. In this paper, classical criteria (leave-one-out cross-validation and adjusted Wold's criterion) are compared to recently proposed alternatives (smoothed PLS-PoLiSh and a randomization test) to seek out the optimum dimensionality of PLS models. Kerosene (jet fuel) samples were measured by attenuated total reflectance-mid-IR spectrometry and their spectra where used to predict eight important properties determined using reference methods that are time-consuming and prone to analytical errors. The alternative methods were shown to give reliable dimensionality predictions when compared to external validation. By contrast, the simpler methods seemed to be largely affected by the largest changes in the modeling capabilities of the first components.
NASA Astrophysics Data System (ADS)
Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal
2018-06-01
Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.
Zhu, Hongxiao; Morris, Jeffrey S; Wei, Fengrong; Cox, Dennis D
2017-07-01
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Fuel decomposition and boundary-layer combustion processes of hybrid rocket motors
NASA Technical Reports Server (NTRS)
Chiaverini, Martin J.; Harting, George C.; Lu, Yeu-Cherng; Kuo, Kenneth K.; Serin, Nadir; Johnson, David K.
1995-01-01
Using a high-pressure, two-dimensional hybrid motor, an experimental investigation was conducted on fundamental processes involved in hybrid rocket combustion. HTPB (Hydroxyl-terminated Polybutadiene) fuel cross-linked with diisocyanate was burned with GOX under various operating conditions. Large-amplitude pressure oscillations were encountered in earlier test runs. After identifying the source of instability and decoupling the GOX feed-line system and combustion chamber, the pressure oscillations were drastically reduced from +/-20% of the localized mean pressure to an acceptable range of +/-1.5% Embedded fine-wire thermocouples indicated that the surface temperature of the burning fuel was around 1000 K depending upon axial locations and operating conditions. Also, except near the leading-edge region, the subsurface thermal wave profiles in the upstream locations are thicker than those in the downstream locations since the solid-fuel regression rate, in general, increases with distance along the fuel slab. The recovered solid fuel slabs in the laminar portion of the boundary layer exhibited smooth surfaces, indicating the existence of a liquid melt layer on the burning fuel surface in the upstream region. After the transition section, which displayed distinct transverse striations, the surface roughness pattern became quite random and very pronounced in the downstream turbulent boundary-layer region. Both real-time X-ray radiography and ultrasonic pulse-echo techniques were used to determine the instantaneous web thickness burned and instantaneous solid-fuel regression rates over certain portions of the fuel slabs. Globally averaged and axially dependent but time-averaged regression rates were also obtained and presented.
Hagen, Nathan; Kester, Robert T.; Gao, Liang; Tkaczyk, Tomasz S.
2012-01-01
The snapshot advantage is a large increase in light collection efficiency available to high-dimensional measurement systems that avoid filtering and scanning. After discussing this advantage in the context of imaging spectrometry, where the greatest effort towards developing snapshot systems has been made, we describe the types of measurements where it is applicable. We then generalize it to the larger context of high-dimensional measurements, where the advantage increases geometrically with measurement dimensionality. PMID:22791926
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Are We All in the Same Boat? The Role of Perceptual Distance in Organizational Health Interventions.
Hasson, Henna; von Thiele Schwarz, Ulrica; Nielsen, Karina; Tafvelin, Susanne
2016-10-01
The study investigates how agreement between leaders' and their team's perceptions influence intervention outcomes in a leadership-training intervention aimed at improving organizational learning. Agreement, i.e. perceptual distance was calculated for the organizational learning dimensions at baseline. Changes in the dimensions from pre-intervention to post-intervention were evaluated using polynomial regression analysis with response surface analysis. The general pattern of the results indicated that the organizational learning improved when leaders and their teams agreed on the level of organizational learning prior to the intervention. The improvement was greatest when the leader's and the team's perceptions at baseline were aligned and high rather than aligned and low. The least beneficial scenario was when the leader's perceptions were higher than the team's perceptions. These results give insights into the importance of comparing leaders' and their team's perceptions in intervention research. Polynomial regression analyses with response surface methodology allow three-dimensional examination of relationship between two predictor variables and an outcome. This contributes with knowledge on how combination of predictor variables may affect outcome and allows studies of potential non-linearity relating to the outcome. Future studies could use these methods in process evaluation of interventions. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics
Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike
2010-01-01
We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139
Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients
NASA Astrophysics Data System (ADS)
Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.
2017-12-01
The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.
Liu, Yang; Chiaromonte, Francesca; Li, Bing
2017-06-01
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach. © 2016, The International Biometric Society.
Confounder Detection in High-Dimensional Linear Models Using First Moments of Spectral Measures.
Liu, Furui; Chan, Laiwan
2018-06-12
In this letter, we study the confounder detection problem in the linear model, where the target variable [Formula: see text] is predicted using its [Formula: see text] potential causes [Formula: see text]. Based on an assumption of a rotation-invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of [Formula: see text] is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Analyzing spectral measure patterns could help to detect confounding. In this letter, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector-induced spectral measure and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of [Formula: see text]. The two moments coincide in nonconfounding cases and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need to analyze the spectral measure pattern, our method avoids the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.
A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection.
Cava, William La; Helmuth, Thomas; Spector, Lee; Moore, Jason H
2018-05-10
Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we propose new variants of ε-lexicase selection, a method that modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that ε-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.
Reduction of time-resolved space-based CCD photometry developed for MOST Fabry Imaging data*
NASA Astrophysics Data System (ADS)
Reegen, P.; Kallinger, T.; Frast, D.; Gruberbauer, M.; Huber, D.; Matthews, J. M.; Punz, D.; Schraml, S.; Weiss, W. W.; Kuschnig, R.; Moffat, A. F. J.; Walker, G. A. H.; Guenther, D. B.; Rucinski, S. M.; Sasselov, D.
2006-04-01
The MOST (Microvariability and Oscillations of Stars) satellite obtains ultraprecise photometry from space with high sampling rates and duty cycles. Astronomical photometry or imaging missions in low Earth orbits, like MOST, are especially sensitive to scattered light from Earthshine, and all these missions have a common need to extract target information from voluminous data cubes. They consist of upwards of hundreds of thousands of two-dimensional CCD frames (or subrasters) containing from hundreds to millions of pixels each, where the target information, superposed on background and instrumental effects, is contained only in a subset of pixels (Fabry Images, defocused images, mini-spectra). We describe a novel reduction technique for such data cubes: resolving linear correlations of target and background pixel intensities. This step-wise multiple linear regression removes only those target variations which are also detected in the background. The advantage of regression analysis versus background subtraction is the appropriate scaling, taking into account that the amount of contamination may differ from pixel to pixel. The multivariate solution for all pairs of target/background pixels is minimally invasive of the raw photometry while being very effective in reducing contamination due to, e.g. stray light. The technique is tested and demonstrated with both simulated oscillation signals and real MOST photometry.
Jiang, Xiaoyu; Fuchs, Mathias
2017-01-01
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. PMID:28546826
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Zhang, Yong-Tao; Shi, Jing; Shu, Chi-Wang; Zhou, Ye
2003-10-01
A quantitative study is carried out in this paper to investigate the size of numerical viscosities and the resolution power of high-order weighted essentially nonoscillatory (WENO) schemes for solving one- and two-dimensional Navier-Stokes equations for compressible gas dynamics with high Reynolds numbers. A one-dimensional shock tube problem, a one-dimensional example with parameters motivated by supernova and laser experiments, and a two-dimensional Rayleigh-Taylor instability problem are used as numerical test problems. For the two-dimensional Rayleigh-Taylor instability problem, or similar problems with small-scale structures, the details of the small structures are determined by the physical viscosity (therefore, the Reynolds number) in the Navier-Stokes equations. Thus, to obtain faithful resolution to these small-scale structures, the numerical viscosity inherent in the scheme must be small enough so that the physical viscosity dominates. A careful mesh refinement study is performed to capture the threshold mesh for full resolution, for specific Reynolds numbers, when WENO schemes of different orders of accuracy are used. It is demonstrated that high-order WENO schemes are more CPU time efficient to reach the same resolution, both for the one-dimensional and two-dimensional test problems.
Replica analysis of overfitting in regression models for time-to-event data
NASA Astrophysics Data System (ADS)
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhan, You-Bang; Zhang, Qun-Yong; Wang, Yu-Wu; Ma, Peng-Cheng
2010-01-01
We propose a scheme to teleport an unknown single-qubit state by using a high-dimensional entangled state as the quantum channel. As a special case, a scheme for teleportation of an unknown single-qubit state via three-dimensional entangled state is investigated in detail. Also, this scheme can be directly generalized to an unknown f-dimensional state by using a d-dimensional entangled state (d > f) as the quantum channel.
Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.
2017-07-14
A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
Multiple network-constrained regressions expand insights into influenza vaccination responses.
Avey, Stefan; Mohanty, Subhasis; Wilson, Jean; Zapata, Heidi; Joshi, Samit R; Siconolfi, Barbara; Tsang, Sui; Shaw, Albert C; Kleinstein, Steven H
2017-07-15
Systems immunology leverages recent technological advancements that enable broad profiling of the immune system to better understand the response to infection and vaccination, as well as the dysregulation that occurs in disease. An increasingly common approach to gain insights from these large-scale profiling experiments involves the application of statistical learning methods to predict disease states or the immune response to perturbations. However, the goal of many systems studies is not to maximize accuracy, but rather to gain biological insights. The predictors identified using current approaches can be biologically uninterpretable or present only one of many equally predictive models, leading to a narrow understanding of the underlying biology. Here we show that incorporating prior biological knowledge within a logistic modeling framework by using network-level constraints on transcriptional profiling data significantly improves interpretability. Moreover, incorporating different types of biological knowledge produces models that highlight distinct aspects of the underlying biology, while maintaining predictive accuracy. We propose a new framework, Logistic Multiple Network-constrained Regression (LogMiNeR), and apply it to understand the mechanisms underlying differential responses to influenza vaccination. Although standard logistic regression approaches were predictive, they were minimally interpretable. Incorporating prior knowledge using LogMiNeR led to models that were equally predictive yet highly interpretable. In this context, B cell-specific genes and mTOR signaling were associated with an effective vaccination response in young adults. Overall, our results demonstrate a new paradigm for analyzing high-dimensional immune profiling data in which multiple networks encoding prior knowledge are incorporated to improve model interpretability. The R source code described in this article is publicly available at https://bitbucket.org/kleinstein/logminer . steven.kleinstein@yale.edu or stefan.avey@yale.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman
2016-04-01
Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.
A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
CerebroMatic: A Versatile Toolbox for Spline-Based MRI Template Creation
Wilke, Marko; Altaye, Mekibib; Holland, Scott K.
2017-01-01
Brain image spatial normalization and tissue segmentation rely on prior tissue probability maps. Appropriately selecting these tissue maps becomes particularly important when investigating “unusual” populations, such as young children or elderly subjects. When creating such priors, the disadvantage of applying more deformation must be weighed against the benefit of achieving a crisper image. We have previously suggested that statistically modeling demographic variables, instead of simply averaging images, is advantageous. Both aspects (more vs. less deformation and modeling vs. averaging) were explored here. We used imaging data from 1914 subjects, aged 13 months to 75 years, and employed multivariate adaptive regression splines to model the effects of age, field strength, gender, and data quality. Within the spm/cat12 framework, we compared an affine-only with a low- and a high-dimensional warping approach. As expected, more deformation on the individual level results in lower group dissimilarity. Consequently, effects of age in particular are less apparent in the resulting tissue maps when using a more extensive deformation scheme. Using statistically-described parameters, high-quality tissue probability maps could be generated for the whole age range; they are consistently closer to a gold standard than conventionally-generated priors based on 25, 50, or 100 subjects. Distinct effects of field strength, gender, and data quality were seen. We conclude that an extensive matching for generating tissue priors may model much of the variability inherent in the dataset which is then not contained in the resulting priors. Further, the statistical description of relevant parameters (using regression splines) allows for the generation of high-quality tissue probability maps while controlling for known confounds. The resulting CerebroMatic toolbox is available for download at http://irc.cchmc.org/software/cerebromatic.php. PMID:28275348
CerebroMatic: A Versatile Toolbox for Spline-Based MRI Template Creation.
Wilke, Marko; Altaye, Mekibib; Holland, Scott K
2017-01-01
Brain image spatial normalization and tissue segmentation rely on prior tissue probability maps. Appropriately selecting these tissue maps becomes particularly important when investigating "unusual" populations, such as young children or elderly subjects. When creating such priors, the disadvantage of applying more deformation must be weighed against the benefit of achieving a crisper image. We have previously suggested that statistically modeling demographic variables, instead of simply averaging images, is advantageous. Both aspects (more vs. less deformation and modeling vs. averaging) were explored here. We used imaging data from 1914 subjects, aged 13 months to 75 years, and employed multivariate adaptive regression splines to model the effects of age, field strength, gender, and data quality. Within the spm/cat12 framework, we compared an affine-only with a low- and a high-dimensional warping approach. As expected, more deformation on the individual level results in lower group dissimilarity. Consequently, effects of age in particular are less apparent in the resulting tissue maps when using a more extensive deformation scheme. Using statistically-described parameters, high-quality tissue probability maps could be generated for the whole age range; they are consistently closer to a gold standard than conventionally-generated priors based on 25, 50, or 100 subjects. Distinct effects of field strength, gender, and data quality were seen. We conclude that an extensive matching for generating tissue priors may model much of the variability inherent in the dataset which is then not contained in the resulting priors. Further, the statistical description of relevant parameters (using regression splines) allows for the generation of high-quality tissue probability maps while controlling for known confounds. The resulting CerebroMatic toolbox is available for download at http://irc.cchmc.org/software/cerebromatic.php.
Bohnert, Amy S B; German, Danielle; Knowlton, Amy R; Latkin, Carl A
2010-03-01
Social support is a multi-dimensional construct that is important to drug use cessation. The present study identified types of supportive friends among the social network members in a community-based sample and examined the relationship of supporter-type classes with supporter, recipient, and supporter-recipient relationship characteristics. We hypothesized that the most supportive network members and their support recipients would be less likely to be current heroin/cocaine users. Participants (n=1453) were recruited from low-income neighborhoods with a high prevalence of drug use. Participants identified their friends via a network inventory, and all nominated friends were included in a latent class analysis and grouped based on their probability of providing seven types of support. These latent classes were included as the dependent variable in a multi-level regression of supporter drug use, recipient drug use, and other characteristics. The best-fitting latent class model identified five support patterns: friends who provided Little/No Support, Low/Moderate Support, High Support, Socialization Support, and Financial Support. In bivariate models, friends in the High, Low/Moderate, and Financial Support were less likely to use heroin or cocaine and had less conflict with and were more trusted by the support recipient than friends in the Low/No Support class. Individuals with supporters in those same support classes compared to the Low/No Support class were less likely to use heroin or cocaine, or to be homeless or female. Multivariable models suggested similar trends. Those with current heroin/cocaine use were less likely to provide or receive comprehensive support from friends. Published by Elsevier Ireland Ltd.
NASA Astrophysics Data System (ADS)
Edwards, L. L.; Harvey, T. F.; Freis, R. P.; Pitovranov, S. E.; Chernokozhin, E. V.
1992-10-01
The accuracy associated with assessing the environmental consequences of an accidental release of radioactivity is highly dependent on our knowledge of the source term characteristics and, in the case when the radioactivity is condensed on particles, the particle size distribution, all of which are generally poorly known. This paper reports on the development of a numerical technique that integrates the radiological measurements with atmospheric dispersion modeling. This results in a more accurate particle-size distribution and particle injection height estimation when compared with measurements of high explosive dispersal of (239)Pu. The estimation model is based on a non-linear least squares regression scheme coupled with the ARAC three-dimensional atmospheric dispersion models. The viability of the approach is evaluated by estimation of ADPIC model input parameters such as the ADPIC particle size mean aerodynamic diameter, the geometric standard deviation, and largest size. Additionally we estimate an optimal 'coupling coefficient' between the particles and an explosive cloud rise model. The experimental data are taken from the Clean Slate 1 field experiment conducted during 1963 at the Tonopah Test Range in Nevada. The regression technique optimizes the agreement between the measured and model predicted concentrations of (239)Pu by varying the model input parameters within their respective ranges of uncertainties. The technique generally estimated the measured concentrations within a factor of 1.5, with the worst estimate being within a factor of 5, very good in view of the complexity of the concentration measurements, the uncertainties associated with the meteorological data, and the limitations of the models. The best fit also suggest a smaller mean diameter and a smaller geometric standard deviation on the particle size as well as a slightly weaker particle to cloud coupling than previously reported.
Detection of Subtle Context-Dependent Model Inaccuracies in High-Dimensional Robot Domains.
Mendoza, Juan Pablo; Simmons, Reid; Veloso, Manuela
2016-12-01
Autonomous robots often rely on models of their sensing and actions for intelligent decision making. However, when operating in unconstrained environments, the complexity of the world makes it infeasible to create models that are accurate in every situation. This article addresses the problem of using potentially large and high-dimensional sets of robot execution data to detect situations in which a robot model is inaccurate-that is, detecting context-dependent model inaccuracies in a high-dimensional context space. To find inaccuracies tractably, the robot conducts an informed search through low-dimensional projections of execution data to find parametric Regions of Inaccurate Modeling (RIMs). Empirical evidence from two robot domains shows that this approach significantly enhances the detection power of existing RIM-detection algorithms in high-dimensional spaces.
Distribution of high-dimensional entanglement via an intra-city free-space link
Steinlechner, Fabian; Ecker, Sebastian; Fink, Matthias; Liu, Bo; Bavaresco, Jessica; Huber, Marcus; Scheidl, Thomas; Ursin, Rupert
2017-01-01
Quantum entanglement is a fundamental resource in quantum information processing and its distribution between distant parties is a key challenge in quantum communications. Increasing the dimensionality of entanglement has been shown to improve robustness and channel capacities in secure quantum communications. Here we report on the distribution of genuine high-dimensional entanglement via a 1.2-km-long free-space link across Vienna. We exploit hyperentanglement, that is, simultaneous entanglement in polarization and energy-time bases, to encode quantum information, and observe high-visibility interference for successive correlation measurements in each degree of freedom. These visibilities impose lower bounds on entanglement in each subspace individually and certify four-dimensional entanglement for the hyperentangled system. The high-fidelity transmission of high-dimensional entanglement under real-world atmospheric link conditions represents an important step towards long-distance quantum communications with more complex quantum systems and the implementation of advanced quantum experiments with satellite links. PMID:28737168
Approximating high-dimensional dynamics by barycentric coordinates with linear programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirata, Yoshito, E-mail: yoshito@sat.t.u-tokyo.ac.jp; Aihara, Kazuyuki; Suzuki, Hideyuki
The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics ofmore » the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.« less
Approximating high-dimensional dynamics by barycentric coordinates with linear programming.
Hirata, Yoshito; Shiro, Masanori; Takahashi, Nozomu; Aihara, Kazuyuki; Suzuki, Hideyuki; Mas, Paloma
2015-01-01
The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics of the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.
Distribution of high-dimensional entanglement via an intra-city free-space link.
Steinlechner, Fabian; Ecker, Sebastian; Fink, Matthias; Liu, Bo; Bavaresco, Jessica; Huber, Marcus; Scheidl, Thomas; Ursin, Rupert
2017-07-24
Quantum entanglement is a fundamental resource in quantum information processing and its distribution between distant parties is a key challenge in quantum communications. Increasing the dimensionality of entanglement has been shown to improve robustness and channel capacities in secure quantum communications. Here we report on the distribution of genuine high-dimensional entanglement via a 1.2-km-long free-space link across Vienna. We exploit hyperentanglement, that is, simultaneous entanglement in polarization and energy-time bases, to encode quantum information, and observe high-visibility interference for successive correlation measurements in each degree of freedom. These visibilities impose lower bounds on entanglement in each subspace individually and certify four-dimensional entanglement for the hyperentangled system. The high-fidelity transmission of high-dimensional entanglement under real-world atmospheric link conditions represents an important step towards long-distance quantum communications with more complex quantum systems and the implementation of advanced quantum experiments with satellite links.
Wingfield, Jenna L.; Ruane, Lauren G.; Patterson, Joshua D.
2017-01-01
Premise of the study: The three-dimensional structure of tree canopies creates environmental heterogeneity, which can differentially influence the chemistry, morphology, physiology, and/or phenology of leaves. Previous studies that subdivide canopy leaves into broad categories (i.e., “upper/lower”) fail to capture the differences in microenvironments experienced by leaves throughout the three-dimensional space of a canopy. Methods: We use a three-dimensional spatial mapping approach based on spherical polar coordinates to examine the fine-scale spatial distributions of photosynthetically active radiation (PAR) and the concentration of ultraviolet (UV)-absorbing compounds (A300) among leaves within the canopies of black mangroves (Avicennia germinans). Results: Linear regressions revealed that interior leaves received less PAR and produced fewer UV-absorbing compounds than leaves on the exterior of the canopy. By allocating more UV-absorbing compounds to the leaves on the exterior of the canopy, black mangroves may be maximizing UV-protection while minimizing biosynthesis of UV-absorbing compounds. Discussion: Three-dimensional spatial mapping provides an inexpensive and portable method to detect fine-scale differences in environmental and biological traits within canopies. We used it to understand the relationship between PAR and A300, but the same approach can also be used to identify traits associated with the spatial distribution of herbivores, pollinators, and pathogens. PMID:29188145
Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions
NASA Astrophysics Data System (ADS)
Chen, N.; Majda, A.
2017-12-01
Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method, which is based on an effective data assimilation framework, provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace. Therefore, it is computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from the traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has a significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O(100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
NASA Astrophysics Data System (ADS)
Agapov, Vladimir
2018-03-01
The necessity of new approaches to the modeling of rods in the analysis of high-rise constructions is justified. The possibility of the application of the three-dimensional superelements of rods with rectangular cross section for the static and dynamic calculation of the bar and combined structures is considered. The results of the eighteen-story spatial frame free vibrations analysis using both one-dimensional and three-dimensional models of rods are presented. A comparative analysis of the obtained results is carried out and the conclusions on the possibility of three-dimensional superelements application in static and dynamic analysis of high-rise constructions are given on its basis.
High-Dimensional Function Approximation With Neural Networks for Large Volumes of Data.
Andras, Peter
2018-02-01
Approximation of high-dimensional functions is a challenge for neural networks due to the curse of dimensionality. Often the data for which the approximated function is defined resides on a low-dimensional manifold and in principle the approximation of the function over this manifold should improve the approximation performance. It has been show that projecting the data manifold into a lower dimensional space, followed by the neural network approximation of the function over this space, provides a more precise approximation of the function than the approximation of the function with neural networks in the original data space. However, if the data volume is very large, the projection into the low-dimensional space has to be based on a limited sample of the data. Here, we investigate the nature of the approximation error of neural networks trained over the projection space. We show that such neural networks should have better approximation performance than neural networks trained on high-dimensional data even if the projection is based on a relatively sparse sample of the data manifold. We also find that it is preferable to use a uniformly distributed sparse sample of the data for the purpose of the generation of the low-dimensional projection. We illustrate these results considering the practical neural network approximation of a set of functions defined on high-dimensional data including real world data as well.
Fault detection in reciprocating compressor valves under varying load conditions
NASA Astrophysics Data System (ADS)
Pichler, Kurt; Lughofer, Edwin; Pichler, Markus; Buchegger, Thomas; Klement, Erich Peter; Huschenbett, Matthias
2016-03-01
This paper presents a novel approach for detecting cracked or broken reciprocating compressor valves under varying load conditions. The main idea is that the time frequency representation of vibration measurement data will show typical patterns depending on the fault state. The problem is to detect these patterns reliably. For the detection task, we make a detour via the two dimensional autocorrelation. The autocorrelation emphasizes the patterns and reduces noise effects. This makes it easier to define appropriate features. After feature extraction, classification is done using logistic regression and support vector machines. The method's performance is validated by analyzing real world measurement data. The results will show a very high detection accuracy while keeping the false alarm rates at a very low level for different compressor loads, thus achieving a load-independent method. The proposed approach is, to our best knowledge, the first automated method for reciprocating compressor valve fault detection that can handle varying load conditions.
Fukushima, Makoto; Saunders, Richard C; Fujii, Naotaka; Averbeck, Bruno B; Mishkin, Mortimer
2014-01-01
Vocal production is an example of controlled motor behavior with high temporal precision. Previous studies have decoded auditory evoked cortical activity while monkeys listened to vocalization sounds. On the other hand, there have been few attempts at decoding motor cortical activity during vocal production. Here we recorded cortical activity during vocal production in the macaque with a chronically implanted electrocorticographic (ECoG) electrode array. The array detected robust activity in motor cortex during vocal production. We used a nonlinear dynamical model of the vocal organ to reduce the dimensionality of `Coo' calls produced by the monkey. We then used linear regression to evaluate the information in motor cortical activity for this reduced representation of calls. This simple linear model accounted for circa 65% of the variance in the reduced sound representations, supporting the feasibility of using the dynamical model of the vocal organ for decoding motor cortical activity during vocal production.
Model-free inference of direct network interactions from nonlinear collective dynamics.
Casadiego, Jose; Nitzan, Mor; Hallerberg, Sarah; Timme, Marc
2017-12-19
The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.
Evaluation of gridding procedures for air temperature over Southern Africa
NASA Astrophysics Data System (ADS)
Eiselt, Kai-Uwe; Kaspar, Frank; Mölg, Thomas; Krähenmann, Stefan; Posada, Rafael; Riede, Jens O.
2017-06-01
Africa is considered to be highly vulnerable to climate change, yet the availability of observational data and derived products is limited. As one element of the SASSCAL initiative (Southern African Science Service Centre for Climate Change and Adaptive Land Management), a cooperation of Angola, Botswana, Namibia, Zambia, South Africa and Germany, networks of automatic weather stations have been installed or improved (http://www.sasscalweathernet.org). The increased availability of meteorological observations improves the quality of gridded products for the region. Here we compare interpolation methods for monthly minimum and maximum temperatures which were calculated from hourly measurements. Due to a lack of longterm records we focused on data ranging from September 2014 to August 2016. The best interpolation results have been achieved combining multiple linear regression (elevation, a continentality index and latitude as predictors) with three dimensional inverse distance weighted interpolation.
NASA Astrophysics Data System (ADS)
Kuehndel, J.; Kerler, B.; Karcher, C.
2018-04-01
To improve performance of heat exchangers for vehicle applications, it is necessary to increase the air side heat transfer. Selective laser melting gives rise to be applied for fin development due to: i) independency of conventional tooling ii) a fast way to conduct essential experimental studies iii) high dimensional accuracy iv) degrees of freedom in design. Therefore, heat exchanger elements with wavy fins were examined in an experimental study. Experiments were conducted for air side Reynolds number range of 1400-7400, varying wavy amplitude and wave length of the fins at a constant water flow rate of 9.0 m3/h. Heat transfer and pressure drop characteristics were evaluated with Nusselt Number Nu and Darcy friction factor ψ as functions of Reynolds number. Heat transfer and pressure drop correlations were derived from measurement data obtained by regression analysis.
Hooghe, Marc
2011-06-01
In order to assess the determinants of homophobia among Belgian adolescents, a shortened version of the Homophobia scale (Wright et al., 1999) was included in a representative survey among Belgian adolescents (n = 4,870). Principal component analysis demonstrated that the scale was one-dimensional and internally coherent. The results showed that homophobia is still widespread among Belgian adolescents, despite various legal reforms in the country aiming to combat discrimination of gay women and men. A multivariate regression analysis demonstrated that boys, ethnic minorities, individuals with high levels of ethnocentrism and an instrumental worldview, Muslim minorities, and those with low levels of associational involvement scored significantly higher on the scale. While among boys an extensive friendship network was associated with higher levels of homophobia, the opposite phenomenon was found among girls. We discuss the possible relation between notions of masculinity within predominantly male adolescent friendship networks and social support for homophobia.
Quantification of pleural effusion on CT by simple measurement.
Hazlinger, Martin; Ctvrtlik, Filip; Langova, Katerina; Herman, Miroslav
2014-01-01
To find the simplest method for quantifying pleural effusion volume from CT scans. Seventy pleural effusions found on chest CT examination in 50 consecutive adult patients with the presence of free pleural effusion were included. The volume of pleural effusion was calculated from a three-dimensional reconstruction of CT scans. Planar measurements were made on CT scans and their two-dimensional reconstructions in the sagittal plane and at three levels on transversal scans. Individual planar measurements were statistically compared with the detected volume of pleural effusion. Regression equations, averaged absolute difference between observed and predicted values and determination coefficients were found for all measurements and their combinations. A tabular expression of the best single planar measurement was created. The most accurate correlation between the volume and a single planar measurement was found in the dimension measured perpendicular to the parietal pleura on transversal scan with the greatest depth of effusion. Conversion of this measurement to the appropriate volume is possible by regression equation: Volume = 0.365 × b(3) - 4.529 × b(2) + 159.723 × b - 88.377. We devised a simple method of conversion of a single planar measurement on CT scan to the volume of pleural effusion. The tabular expression of our equation can be easily and effectively used in routine practice.
Lu, Ake Tzu-Hui; Austin, Erin; Bonner, Ashley; Huang, Hsin-Hsiung; Cantor, Rita M
2014-09-01
Machine learning methods (MLMs), designed to develop models using high-dimensional predictors, have been used to analyze genome-wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single-nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy-based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high-dimensional data. © 2014 WILEY PERIODICALS, INC.
Hogan, R E; Wang, L; Bertrand, M E; Willmore, L J; Bucholz, R D; Nassif, A S; Csernansky, J G
2006-01-01
We objectively assessed surface structural changes of the hippocampus in mesial temporal sclerosis (MTS) and assessed the ability of large-deformation high-dimensional mapping (HDM-LD) to demonstrate hippocampal surface symmetry and predict group classification of MTS in right and left MTS groups compared with control subjects. Using eigenvector field analysis of HDM-LD segmentations of the hippocampus, we compared the symmetry of changes in the right and left MTS groups with a group of 15 matched controls. To assess the ability of HDM-LD to predict group classification, eigenvectors were selected by a logistic regression procedure when comparing the MTS group with control subjects. Multivariate analysis of variance on the coefficients from the first 9 eigenvectors accounted for 75% of the total variance between groups. The first 3 eigenvectors showed the largest differences between the control group and each of the MTS groups, but with eigenvector 2 showing the greatest difference in the MTS groups. Reconstruction of the hippocampal deformation vector fields due solely to eigenvector 2 shows symmetrical patterns in the right and left MTS groups. A "leave-one-out" (jackknife) procedure correctly predicted group classification in 14 of 15 (93.3%) left MTS subjects and all 15 right MTS subjects. Analysis of principal dimensions of hippocampal shape change suggests that MTS, after accounting for normal right-left asymmetries, affects the right and left hippocampal surface structure very symmetrically. Preliminary analysis using HDM-LD shows it can predict group classification of MTS and control hippocampi in this well-defined population of patients with MTS and mesial temporal lobe epilepsy (MTLE).
Certifying an Irreducible 1024-Dimensional Photonic State Using Refined Dimension Witnesses.
Aguilar, Edgar A; Farkas, Máté; Martínez, Daniel; Alvarado, Matías; Cariñe, Jaime; Xavier, Guilherme B; Barra, Johanna F; Cañas, Gustavo; Pawłowski, Marcin; Lima, Gustavo
2018-06-08
We report on a new class of dimension witnesses, based on quantum random access codes, which are a function of the recorded statistics and that have different bounds for all possible decompositions of a high-dimensional physical system. Thus, it certifies the dimension of the system and has the new distinct feature of identifying whether the high-dimensional system is decomposable in terms of lower dimensional subsystems. To demonstrate the practicability of this technique, we used it to experimentally certify the generation of an irreducible 1024-dimensional photonic quantum state. Therefore, certifying that the state is not multipartite or encoded using noncoupled different degrees of freedom of a single photon. Our protocol should find applications in a broad class of modern quantum information experiments addressing the generation of high-dimensional quantum systems, where quantum tomography may become intractable.
Certifying an Irreducible 1024-Dimensional Photonic State Using Refined Dimension Witnesses
NASA Astrophysics Data System (ADS)
Aguilar, Edgar A.; Farkas, Máté; Martínez, Daniel; Alvarado, Matías; Cariñe, Jaime; Xavier, Guilherme B.; Barra, Johanna F.; Cañas, Gustavo; Pawłowski, Marcin; Lima, Gustavo
2018-06-01
We report on a new class of dimension witnesses, based on quantum random access codes, which are a function of the recorded statistics and that have different bounds for all possible decompositions of a high-dimensional physical system. Thus, it certifies the dimension of the system and has the new distinct feature of identifying whether the high-dimensional system is decomposable in terms of lower dimensional subsystems. To demonstrate the practicability of this technique, we used it to experimentally certify the generation of an irreducible 1024-dimensional photonic quantum state. Therefore, certifying that the state is not multipartite or encoded using noncoupled different degrees of freedom of a single photon. Our protocol should find applications in a broad class of modern quantum information experiments addressing the generation of high-dimensional quantum systems, where quantum tomography may become intractable.
Models for predicting the mass of lime fruits by some engineering properties.
Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein
2014-11-01
Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.
Quantifying prosthetic gait deviation using simple outcome measures
Kark, Lauren; Odell, Ross; McIntosh, Andrew S; Simmons, Anne
2016-01-01
AIM: To develop a subset of simple outcome measures to quantify prosthetic gait deviation without needing three-dimensional gait analysis (3DGA). METHODS: Eight unilateral, transfemoral amputees and 12 unilateral, transtibial amputees were recruited. Twenty-eight able-bodied controls were recruited. All participants underwent 3DGA, the timed-up-and-go test and the six-minute walk test (6MWT). The lower-limb amputees also completed the Prosthesis Evaluation Questionnaire. Results from 3DGA were summarised using the gait deviation index (GDI), which was subsequently regressed, using stepwise regression, against the other measures. RESULTS: Step-length (SL), self-selected walking speed (SSWS) and the distance walked during the 6MWT (6MWD) were significantly correlated with GDI. The 6MWD was the strongest, single predictor of the GDI, followed by SL and SSWS. The predictive ability of the regression equations were improved following inclusion of self-report data related to mobility and prosthetic utility. CONCLUSION: This study offers a practicable alternative to quantifying kinematic deviation without the need to conduct complete 3DGA. PMID:27335814
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
A baseline-free procedure for transformation models under interval censorship.
Gu, Ming Gao; Sun, Liuquan; Zuo, Guoxin
2005-12-01
An important property of Cox regression model is that the estimation of regression parameters using the partial likelihood procedure does not depend on its baseline survival function. We call such a procedure baseline-free. Using marginal likelihood, we show that an baseline-free procedure can be derived for a class of general transformation models under interval censoring framework. The baseline-free procedure results a simplified and stable computation algorithm for some complicated and important semiparametric models, such as frailty models and heteroscedastic hazard/rank regression models, where the estimation procedures so far available involve estimation of the infinite dimensional baseline function. A detailed computational algorithm using Markov Chain Monte Carlo stochastic approximation is presented. The proposed procedure is demonstrated through extensive simulation studies, showing the validity of asymptotic consistency and normality. We also illustrate the procedure with a real data set from a study of breast cancer. A heuristic argument showing that the score function is a mean zero martingale is provided.
NASA Astrophysics Data System (ADS)
Manstetten, Paul; Filipovic, Lado; Hössinger, Andreas; Weinbub, Josef; Selberherr, Siegfried
2017-02-01
We present a computationally efficient framework to compute the neutral flux in high aspect ratio structures during three-dimensional plasma etching simulations. The framework is based on a one-dimensional radiosity approach and is applicable to simulations of convex rotationally symmetric holes and convex symmetric trenches with a constant cross-section. The framework is intended to replace the full three-dimensional simulation step required to calculate the neutral flux during plasma etching simulations. Especially for high aspect ratio structures, the computational effort, required to perform the full three-dimensional simulation of the neutral flux at the desired spatial resolution, conflicts with practical simulation time constraints. Our results are in agreement with those obtained by three-dimensional Monte Carlo based ray tracing simulations for various aspect ratios and convex geometries. With this framework we present a comprehensive analysis of the influence of the geometrical properties of high aspect ratio structures as well as of the particle sticking probability on the neutral particle flux.
Fickler, Robert; Lapkiewicz, Radek; Huber, Marcus; Lavery, Martin P J; Padgett, Miles J; Zeilinger, Anton
2014-07-30
Photonics has become a mature field of quantum information science, where integrated optical circuits offer a way to scale the complexity of the set-up as well as the dimensionality of the quantum state. On photonic chips, paths are the natural way to encode information. To distribute those high-dimensional quantum states over large distances, transverse spatial modes, like orbital angular momentum possessing Laguerre Gauss modes, are favourable as flying information carriers. Here we demonstrate a quantum interface between these two vibrant photonic fields. We create three-dimensional path entanglement between two photons in a nonlinear crystal and use a mode sorter as the quantum interface to transfer the entanglement to the orbital angular momentum degree of freedom. Thus our results show a flexible way to create high-dimensional spatial mode entanglement. Moreover, they pave the way to implement broad complex quantum networks where high-dimensionally entangled states could be distributed over distant photonic chips.
Predictive modelling of flow in a two-dimensional intermediate-scale, heterogeneous porous media
Barth, Gilbert R.; Hill, M.C.; Illangasekare, T.H.; Rajaram, H.
2000-01-01
To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jan, Nuzhat; Balik, Salim; Hugo, Geoffrey D.
Purpose: To analyze primary tumor (PT) and lymph node (LN) position changes relative to each other and relative to anatomic landmarks during conventionally fractionated radiation therapy for patients with locally advanced lung cancer. Methods and Materials: In 12 patients with locally advanced non-small cell lung cancer PT, LN, carina, and 1 thoracic vertebra were manually contoured on weekly 4-dimensional fan-beam CT scans. Systematic and random interfraction displacements of all contoured structures were identified in the 3 cardinal directions, and resulting setup margins were calculated. Time trends and the effect of volume changes on displacements were analyzed. Results: Three-dimensional displacement vectorsmore » and systematic/random interfraction displacements were smaller for carina than for vertebra both for PT and LN. For PT, mean (SD) 3-dimensional displacement vectors with carina-based alignment were 7 (4) mm versus 9 (5) mm with bony anatomy (P<.0001). For LN, smaller displacements were found with carina- (5 [3] mm, P<.0001) and vertebra-based (6 [3] mm, P=.002) alignment compared with using PT for setup (8 [5] mm). Primary tumor and LN displacements relative to bone and carina were independent (P>.05). Displacements between PT and bone (P=.04) and between PT and LN (P=.01) were significantly correlated with PT volume regression. Displacements between LN and carina were correlated with LN volume change (P=.03). Conclusions: Carina-based setup results in a more reproducible PT and LN alignment than bony anatomy setup. Considering the independence of PT and LN displacement and the impact of volume regression on displacements over time, repeated CT imaging even with PT-based alignment is recommended in locally advanced disease.« less
Maxwell, James L; Rose, Chris R; Black, Marcie R; Springer, Robert W
2014-03-11
Microelectronic structures and devices, and method of fabricating a three-dimensional microelectronic structure is provided, comprising passing a first precursor material for a selected three-dimensional microelectronic structure into a reaction chamber at temperatures sufficient to maintain said precursor material in a predominantly gaseous state; maintaining said reaction chamber under sufficient pressures to enhance formation of a first portion of said three-dimensional microelectronic structure; applying an electric field between an electrode and said microelectronic structure at a desired point under conditions whereat said first portion of a selected three-dimensional microelectronic structure is formed from said first precursor material; positionally adjusting either said formed three-dimensional microelectronic structure or said electrode whereby further controlled growth of said three-dimensional microelectronic structure occurs; passing a second precursor material for a selected three-dimensional microelectronic structure into a reaction chamber at temperatures sufficient to maintain said precursor material in a predominantly gaseous state; maintaining said reaction chamber under sufficient pressures whereby a second portion of said three-dimensional microelectronic structure formation is enhanced; applying an electric field between an electrode and said microelectronic structure at a desired point under conditions whereat said second portion of a selected three-dimensional microelectronic structure is formed from said second precursor material; and, positionally adjusting either said formed three-dimensional microelectronic structure or said electrode whereby further controlled growth of said three-dimensional microelectronic structure occurs.
Knopman, Debra S.; Voss, Clifford I.
1987-01-01
The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.
Waiwijit, Uraiwan; Maturos, Thitima; Pakapongpan, Saithip; Phokharatkul, Ditsayut; Wisitsoraat, Anurat; Tuantranont, Adisorn
2016-08-01
Recently, three-dimensional graphene interconnected network has attracted great interest as a scaffold structure for tissue engineering due to its high biocompatibility, high electrical conductivity, high specific surface area and high porosity. However, free-standing three-dimensional graphene exhibits poor flexibility and stability due to ease of disintegration during processing. In this work, three-dimensional graphene is composited with polydimethylsiloxane to improve the structural flexibility and stability by a new simple two-step process comprising dip coating of polydimethylsiloxane on chemical vapor deposited graphene/Ni foam and wet etching of nickel foam. Structural characterizations confirmed an interconnected three-dimensional multi-layer graphene structure with thin polydimethylsiloxane scaffold. The composite was employed as a substrate for culture of L929 fibroblast cells and its cytocompatibility was evaluated by cell viability (Alamar blue assay), reactive oxygen species production and vinculin immunofluorescence imaging. The result revealed that cell viability on three-dimensional graphene/polydimethylsiloxane composite increased with increasing culture time and was slightly different from a polystyrene substrate (control). Moreover, cells cultured on three-dimensional graphene/polydimethylsiloxane composite generated less ROS than the control at culture times of 3-6 h. The results of immunofluorescence staining demonstrated that fibroblast cells expressed adhesion protein (vinculin) and adhered well on three-dimensional graphene/polydimethylsiloxane surface. Good cell adhesion could be attributed to suitable surface properties of three-dimensional graphene/polydimethylsiloxane with moderate contact angle and small negative zeta potential in culture solution. The results of electrochemical study by cyclic voltammetry showed that an oxidation current signal with no apparent peak was induced by fibroblast cells and the oxidation current at an oxidation potential of +0.9 V increased linearly with increasing cell number. Therefore, the three-dimensional graphene/polydimethylsiloxane composite exhibits high cytocompatibility and can potentially be used as a conductive substrate for cell-based electrochemical sensing. © The Author(s) 2016.
Combustion of solid fuel slabs with gaseous oxygen in a hybrid motor analog
NASA Technical Reports Server (NTRS)
Chiaverini, Martin J.; Harting, George C.; Lu, Yeu-Cherng; Kuo, Kenneth K.; Serin, Nadir; Johnson, David K.
1995-01-01
Using a high-pressure, two-dimensional hybrid motor, an experimental investigation was conducted on fundamental processes involved in hybrid rocket combustion. HTPB (Hydroxyl-terminated Polybutadiene) fuel cross-linked with diisocyanate was burned with gaseous oxygen (GOX) under various operating conditions. Large-amplitude pressure oscillations were encountered in earlier test runs. After identifying the source of instability and decoupling the GOX feed-line system and combustion chamber, the pressure oscillations were drastically reduced from plus or minus 20% of the localized mean pressure to an acceptable range of plus or minus 1.5%. Embedded fine--wire thermocouples indicated that the surface temperature of the burning fuel was around 1000 K depending upon axial locations and operating conditions. Also, except near the leading edge region, the subsurface thermal wave profiles in the upstream locations are thicker than those in the downstream locations since the solid-fuel regression rate, in general, increases with distance along the fuel slab. The recovered solid fuel slabs in the laminar portion of the boundary layer exhibited smooth surfaces, indicating the existence of a liquid melt layer on the burning fuel surface in the upstream region. After the transition section, which displayed distinct transverse striations, the surface roughness pattern became quite random and very pronounced in the downstream turbulent boundary-layer region. Both real-time X-ray radiography and ultrasonic pulse echo techniques were used to determine the instantaneous web thicknesses and instantaneous solid-fuel regression rates over certain portions of the fuel slabs. Globally averaged and axially dependent but time-averaged regression rates were also obtained and presented. Several tests were conducted using, simultaneously, one translucent fuel slab and one fuel slab processed with carbon black powder. The addition of carbon black did not affect the measured regression rates or surface temperatures in comparison to the translucent fuel slabs.
Biodynamic profiling of three-dimensional tissue growth techniques
NASA Astrophysics Data System (ADS)
Sun, Hao; Merrill, Dan; Turek, John; Nolte, David
2016-03-01
Three-dimensional tissue culture presents a more biologically relevant environment in which to perform drug development than conventional two-dimensional cell culture. However, obtaining high-content information from inside three dimensional tissue has presented an obstacle to rapid adoption of 3D tissue culture for pharmaceutical applications. Biodynamic imaging is a high-content three-dimensional optical imaging technology based on low-coherence interferometry and digital holography that uses intracellular dynamics as high-content image contrast. In this paper, we use biodynamic imaging to compare pharmaceutical responses to Taxol of three-dimensional multicellular spheroids grown by three different growth techniques: rotating bioreactor, hanging-drop and plate-grown spheroids. The three growth techniques have systematic variations among tissue cohesiveness and intracellular activity and consequently display different pharmacodynamics under identical drug dose conditions. The in vitro tissue cultures are also compared to ex vivo living biopsies. These results demonstrate that three-dimensional tissue cultures are not equivalent, and that drug-response studies must take into account the growth method.
Wagner, Brian J.; Gorelick, Steven M.
1986-01-01
A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.
Cognitive flexibility correlates with gambling severity in young adults.
Leppink, Eric W; Redden, Sarah A; Chamberlain, Samuel R; Grant, Jon E
2016-10-01
Although gambling disorder (GD) is often characterized as a problem of impulsivity, compulsivity has recently been proposed as a potentially important feature of addictive disorders. The present analysis assessed the neurocognitive and clinical relationship between compulsivity on gambling behavior. A sample of 552 non-treatment seeking gamblers age 18-29 was recruited from the community for a study on gambling in young adults. Gambling severity levels included both casual and disordered gamblers. All participants completed the Intra/Extra-Dimensional Set Shift (IED) task, from which the total adjusted errors were correlated with gambling severity measures, and linear regression modeling was used to assess three error measures from the task. The present analysis found significant positive correlations between problems with cognitive flexibility and gambling severity (reflected by the number of DSM-5 criteria, gambling frequency, amount of money lost in the past year, and gambling urge/behavior severity). IED errors also showed a positive correlation with self-reported compulsive behavior scores. A significant correlation was also found between IED errors and non-planning impulsivity from the BIS. Linear regression models based on total IED errors, extra-dimensional (ED) shift errors, or pre-ED shift errors indicated that these factors accounted for a significant portion of the variance noted in several variables. These findings suggest that cognitive flexibility may be an important consideration in the assessment of gamblers. Results from correlational and linear regression analyses support this possibility, but the exact contributions of both impulsivity and cognitive flexibility remain entangled. Future studies will ideally be able to assess the longitudinal relationships between gambling, compulsivity, and impulsivity, helping to clarify the relative contributions of both impulsive and compulsive features. Copyright © 2016 Elsevier Ltd. All rights reserved.
Modeling the Chagas’ disease after stem cell transplantation
NASA Astrophysics Data System (ADS)
Galvão, Viviane; Miranda, José Garcia Vivas
2009-04-01
A recent model for Chagas’ disease after stem cell transplantation is extended for a three-dimensional multi-agent-based model. The computational model includes six different types of autonomous agents: inflammatory cell, fibrosis, cardiomyocyte, proinflammatory cytokine tumor necrosis factor- α, Trypanosoma cruzi, and bone marrow stem cell. Only fibrosis is fixed and the other types of agents can move randomly through the empty spaces using the three-dimensional Moore neighborhood. Bone marrow stem cells can promote apoptosis in inflammatory cells, fibrosis regression and can differentiate in cardiomyocyte. T. cruzi can increase the number of inflammatory cells. Inflammatory cells and tumor necrosis factor- α can increase the quantity of fibrosis. Our results were compared with experimental data giving a fairly fit and they suggest that the inflammatory cells are important for the development of fibrosis.
NASA Astrophysics Data System (ADS)
Bo, Z.; Chen, J. H.
2010-02-01
The dimensional analysis technique is used to formulate a correlation between ozone generation rate and various parameters that are important in the design and operation of positive wire-to-plate corona discharges in indoor air. The dimensionless relation is determined by linear regression analysis based on the results from 36 laboratory-scale experiments. The derived equation is validated by experimental data and a numerical model published in the literature. Applications of such derived equation are illustrated through an example selection of the appropriate set of operating conditions in the design/operation of a photocopier to follow the federal regulations of ozone emission. Finally, a new current-voltage characteristic equation is proposed for positive wire-to-plate corona discharges based on the derived dimensionless equation.
Efficient High Order Central Schemes for Multi-Dimensional Hamilton-Jacobi Equations: Talk Slides
NASA Technical Reports Server (NTRS)
Bryson, Steve; Levy, Doron; Biegel, Brian R. (Technical Monitor)
2002-01-01
This viewgraph presentation presents information on the attempt to produce high-order, efficient, central methods that scale well to high dimension. The central philosophy is that the equations should evolve to the point where the data is smooth. This is accomplished by a cyclic pattern of reconstruction, evolution, and re-projection. One dimensional and two dimensional representational methods are detailed, as well.
Cultured High-Fidelity Three-Dimensional Human Urogenital Tract Carcinomas and Process
NASA Technical Reports Server (NTRS)
Goodwin, Thomas J. (Inventor); Prewett, Tacey L. (Inventor); Spaulding, Glenn F. (Inventor); Wolf, David A. (Inventor)
1998-01-01
Artificial high-fidelity three-dimensional human urogenital tract carcinomas are propagated under in vitro-microgravity conditions from carcinoma cells. Artificial high-fidelity three-dimensional human urogenital tract carcinomas are also propagated from a coculture of normal urogenital tract cells inoculated with carcinoma cells. The microgravity culture conditions may be microgravity or simulated microgravity created in a horizontal rotating wall culture vessel.
NASA Astrophysics Data System (ADS)
Zhang, Shuai; Hu, Fan; Wang, Donghui; Okolo. N, Patrick; Zhang, Weihua
2017-07-01
Numerical simulations on processes within a hybrid rocket motor were conducted in the past, where most of these simulations carried out majorly focused on steady state analysis. Solid fuel regression rate strongly depends on complicated physicochemical processes and internal fluid dynamic behavior within the rocket motor, which changes with both space and time during its operation, and are therefore more unsteady in characteristics. Numerical simulations on the unsteady operational processes of N2O/HTPB hybrid rocket motor with and without diaphragm are conducted within this research paper. A numerical model is established based on two dimensional axisymmetric unsteady Navier-Stokes equations having turbulence, combustion and coupled gas/solid phase formulations. Discrete phase model is used to simulate injection and vaporization of the liquid oxidizer. A dynamic mesh technique is applied to the non-uniform regression of fuel grain, while results of unsteady flow field, variation of regression rate distribution with time, regression process of burning surface and internal ballistics are all obtained. Due to presence of eddy flow, the diaphragm increases regression rate further downstream. Peak regression rates are observed close to flow reattachment regions, while these peak values decrease gradually, and peak position shift further downstream with time advancement. Motor performance is analyzed accordingly, and it is noticed that the case with diaphragm included results in combustion efficiency and specific impulse efficiency increase of roughly 10%, and ground thrust increase of 17.8%.
Supervised Classification Techniques for Hyperspectral Data
NASA Technical Reports Server (NTRS)
Jimenez, Luis O.
1997-01-01
The recent development of more sophisticated remote sensing systems enables the measurement of radiation in many mm-e spectral intervals than previous possible. An example of this technology is the AVIRIS system, which collects image data in 220 bands. The increased dimensionality of such hyperspectral data provides a challenge to the current techniques for analyzing such data. Human experience in three dimensional space tends to mislead one's intuition of geometrical and statistical properties in high dimensional space, properties which must guide our choices in the data analysis process. In this paper high dimensional space properties are mentioned with their implication for high dimensional data analysis in order to illuminate the next steps that need to be taken for the next generation of hyperspectral data classifiers.
USDA-ARS?s Scientific Manuscript database
Recent advances in technology have led to the collection of high-dimensional data not previously encountered in many scientific environments. As a result, scientists are often faced with the challenging task of including these high-dimensional data into statistical models. For example, data from sen...
2006-09-01
Medioni, [11], estimates the local dimension using tensor voting . These recent works have clearly shown the necessity to go beyond manifold learning, into...2005. [11] P. Mordohai and G. Medioni. Unsupervised dimensionality estimation and manifold learning in high-dimensional spaces by tensor voting . In...walking, jumping, and arms waving. The whole run took 361 seconds in Matlab , while the classification time (PMM) can be neglected compared to the kNN
Hwang-Gu, Shoou-Lian; Lin, Hsiang-Yuan; Chen, Yu-Chi; Tseng, Yu-Han; Hsu, Wen-Yau; Chou, Miao-Chun; Chou, Wen-Jun; Wu, Yu-Yu; Gau, Susan Shur-Fen
2018-05-30
Increased intrasubject variability in reaction times (RT-ISV) is frequently found in individuals with autism spectrum disorder (ASD). However, how dimensional attention deficit/hyperactivity disorder (ADHD) symptoms impact RT-ISV in individuals with ASD remains elusive. We assessed 97 high-functioning youths with co-occurring ASD and ADHD (ASD+ADHD), 124 high-functioning youths with ASD only, 98 youths with ADHD only, and 249 typically developing youths, 8-18 years of age, using the Conners Continuous Performance Test (CCPT). We compared the conventional CCPT parameters (omission errors, commission errors, mean RT and RT standard error (RTSE) as well as the ex-Gaussian parameters of RT (mu, sigma, and tau) across the four groups. We also conducted regression analyses to assess the relationships between RT indices and symptoms of ADHD and ASD in the ASD group (i.e., the ASD+ADHD and ASD-only groups). The ASD+ADHD and ADHD-only groups had higher RT-ISV than the other two groups. RT-ISV, specifically RTSE and tau, was significantly associated with ADHD symptoms rather than autistic traits in the ASD group. Regression models also revealed that sex partly accounted for RT-ISV variance in the ASD group. A post hoc analysis showed girls with ASD had higher tau and RTSE values than their male counterparts. Our results suggest that RT-ISV is primarily associated with co-occurring ADHD symptoms/diagnosis in children and adolescents with ASD. These results do not support the hypothesis of response variability as a transdiagnostic phenotype for ASD and ADHD and warrant further validation at a neural level.
Wang, Shu; Lang, Jing He; Cheng, Xue Mei
2009-12-01
The aim of this study was to investigate the cytologic regression in women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus test. The 45 women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus at baseline were analyzed about the cytologic regression during 2 years of follow-up. The cumulative rate of cytologic regression was calculated by Kaplan-Meier curves. Of 45 women, the cumulative rates were as follows: 55.6% obtained cytologic regression before 6 months, 84.4% by 1 year, and 95.6% at 2 years. Cytologic regression was not influenced by age, menopausal status, and baseline human papillomavirus load. However, the 1-year cumulative regression rate in women with previous cervical lesions was significantly lower than those without (P=.02), even much lower in women with high-grade intraepithelial neoplasia or worse (P=.008). Most women with atypical squamous cells of unknown significance and negative high-risk human papillomavirus could obtain cytologic regression within 2 years. Women with antecedent cervical lesions need longer time to reach this regression.
NASA Technical Reports Server (NTRS)
Watanabe, M.; Actor, G.; Gatos, H. C.
1977-01-01
Quantitative analysis of the electron beam induced current in conjunction with high-resolution scanning makes it possible to evaluate the minority-carrier lifetime three dimensionally in the bulk and the surface recombination velocity two dimensionally, with a high spacial resolution. The analysis is based on the concept of the effective excitation strength of the carriers which takes into consideration all possible recombination sources. Two-dimensional mapping of the surface recombination velocity of phosphorus-diffused silicon diodes is presented as well as a three-dimensional mapping of the changes in the minority-carrier lifetime in ion-implanted silicon.
Reduced basis ANOVA methods for partial differential equations with high-dimensional random inputs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, Qifeng, E-mail: liaoqf@shanghaitech.edu.cn; Lin, Guang, E-mail: guanglin@purdue.edu
2016-07-15
In this paper we present a reduced basis ANOVA approach for partial deferential equations (PDEs) with random inputs. The ANOVA method combined with stochastic collocation methods provides model reduction in high-dimensional parameter space through decomposing high-dimensional inputs into unions of low-dimensional inputs. In this work, to further reduce the computational cost, we investigate spatial low-rank structures in the ANOVA-collocation method, and develop efficient spatial model reduction techniques using hierarchically generated reduced bases. We present a general mathematical framework of the methodology, validate its accuracy and demonstrate its efficiency with numerical experiments.
Wang, Zhiping; Chen, Jinyu; Yu, Benli
2017-02-20
We investigate the two-dimensional (2D) and three-dimensional (3D) atom localization behaviors via spontaneously generated coherence in a microwave-driven four-level atomic system. Owing to the space-dependent atom-field interaction, it is found that the detecting probability and precision of 2D and 3D atom localization behaviors can be significantly improved via adjusting the system parameters, the phase, amplitude, and initial population distribution. Interestingly, the atom can be localized in volumes that are substantially smaller than a cubic optical wavelength. Our scheme opens a promising way to achieve high-precision and high-efficiency atom localization, which provides some potential applications in high-dimensional atom nanolithography.
Ruiz-Juretschke, F; Guzmán-de-Villoria, J G; García-Leal, R; Sañudo, J R
2017-05-23
Microvascular decompression (MVD) is accepted as the only aetiological surgical treatment for refractory classic trigeminal neuralgia (TN). There is therefore increasing interest in establishing the diagnostic and prognostic value of identifying neurovascular compressions (NVC) using preoperative high-resolution three-dimensional magnetic resonance (MRI) in patients with classic TN who are candidates for surgery. This observational study includes a series of 74 consecutive patients with classic TN treated with MVD. All patients underwent a preoperative three-dimensional high-resolution MRI with DRIVE sequences to diagnose presence of NVC, as well as the degree, cause, and location of compressions. MRI results were analysed by doctors blinded to surgical findings and subsequently compared to those findings. After a minimum follow-up time of six months, we assessed the surgical outcome and graded it on the Barrow Neurological Institute pain intensity score (BNI score). The prognostic value of the preoperative MRI was estimated using binary logistic regression. Preoperative DRIVE MRI sequences showed a sensitivity of 95% and a specificity of 87%, with a 98% positive predictive value and a 70% negative predictive value. Moreover, Cohen's kappa (CK) indicated a good level of agreement between radiological and surgical findings regarding presence of NVC (CK 0.75), type of compression (CK 0.74) and the site of compression (CK 0.72), with only moderate agreement as to the degree of compression (CK 0.48). After a mean follow-up of 29 months (range 6-100 months), 81% of the patients reported pain control with or without medication (BNI score i-iiiI). Patients with an excellent surgical outcome, i.e. without pain and off medication (BNI score i), made up 66% of the total at the end of follow-up. Univariate analysis using binary logistic regression showed that a diagnosis of NVC on the preoperative MRI was a favorable prognostic factor that significantly increased the odds of obtaining an excellent outcome (OR 0.17, 95% CI 0.04-0.72; P=.02) or an acceptable outcome (OR 0.16, 95% CI 0.04-0.68; P=.01) after MVD. DRIVE MRI shows high sensitivity and specificity for diagnosing NVC in patients with refractory classic TN and who are candidates for MVD. The finding of NVC on preoperative MRI is a good prognostic factor for long-term pain relief with MVD. Copyright © 2017 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Wang, Shige; Li, Kai; Chen, Yu; Chen, Hangrong; Ma, Ming; Feng, Jingwei; Zhao, Qinghua; Shi, Jianlin
2015-01-01
Two-dimensional transition metal dichalcogenides, particularly MoS2 nanosheets, have been deemed as a novel category of NIR photothermal transducing agent. Herein, an efficient and versatile one-pot solvothermal synthesis based on "bottom-up" strategy has been, for the first time, proposed for the controlled synthesis of PEGylated MoS2 nanosheets by using a novel "integrated" precursor containing both Mo and S elements. This facile but unique PEG-mediated solvothermal procedure endowed MoS2 nanosheets with controlled size, increased crystallinity and excellent colloidal stability. The photothermal performance of nanosheets was optimized via modulating the particulate size and surface PEGylation. PEGylated MoS2 nanosheets with desired photothermal conversion performance and excellent colloidal and photothermal stability were further utilized for highly efficient photothermal therapy of cancer in a tumor-bearing mouse xenograft. Without showing observable in vitro and in vivo hemolysis, coagulation and toxicity, the optimized MoS2-PEG nanosheets showed promising in vitro and in vivo anti-cancer efficacy. Copyright © 2014 Elsevier Ltd. All rights reserved.
Joyce, Christopher; Burnett, Angus; Cochrane, Jodie; Reyes, Alvaro
2016-01-01
It is unknown whether skilled golfers will modify their kinematics when using drivers of different shaft properties. This study aimed to firstly determine if golf swing kinematics and swing parameters and related launch conditions differed when using modified drivers, then secondly, determine which kinematics were associated with clubhead speed. Twenty high level amateur male golfers (M ± SD: handicap = 1.9 ± 1.9 score) had their three-dimensional (3D) trunk and wrist kinematics collected for two driver trials. Swing parameters and related launch conditions were collected using a launch monitor. A one-way repeated measures ANOVA revealed significant (p ≤ 0.003) between driver differences; specifically, faster trunk axial rotation velocity and an early wrist release for the low kick point driver. Launch angle was shown to be 2° lower for the high kick point driver. Regression models for both drivers explained a significant amount of variance (60-67%) in clubhead speed. Wrist kinematics were most associated with clubhead speed, indicating the importance of the wrists in producing clubhead speed regardless of driver shaft properties.
Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?
Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W
2018-03-01
The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.
[Application Progress of Three-dimensional Laser Scanning Technology in Medical Surface Mapping].
Zhang, Yonghong; Hou, He; Han, Yuchuan; Wang, Ning; Zhang, Ying; Zhu, Xianfeng; Wang, Mingshi
2016-04-01
The booming three-dimensional laser scanning technology can efficiently and effectively get spatial three-dimensional coordinates of the detected object surface and reconstruct the image at high speed,high precision and large capacity of information.Non-radiation,non-contact and the ability of visualization make it increasingly popular in three-dimensional surface medical mapping.This paper reviews the applications and developments of three-dimensional laser scanning technology in medical field,especially in stomatology,plastic surgery and orthopedics.Furthermore,the paper also discusses the application prospects in the future as well as the biomedical engineering problems it would encounter with.
EM in high-dimensional spaces.
Draper, Bruce A; Elliott, Daniel L; Hayes, Jeremy; Baek, Kyungim
2005-06-01
This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N - 1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.
Katwal, Santosh B; Gore, John C; Marois, Rene; Rogers, Baxter P
2013-09-01
We present novel graph-based visualizations of self-organizing maps for unsupervised functional magnetic resonance imaging (fMRI) analysis. A self-organizing map is an artificial neural network model that transforms high-dimensional data into a low-dimensional (often a 2-D) map using unsupervised learning. However, a postprocessing scheme is necessary to correctly interpret similarity between neighboring node prototypes (feature vectors) on the output map and delineate clusters and features of interest in the data. In this paper, we used graph-based visualizations to capture fMRI data features based upon 1) the distribution of data across the receptive fields of the prototypes (density-based connectivity); and 2) temporal similarities (correlations) between the prototypes (correlation-based connectivity). We applied this approach to identify task-related brain areas in an fMRI reaction time experiment involving a visuo-manual response task, and we correlated the time-to-peak of the fMRI responses in these areas with reaction time. Visualization of self-organizing maps outperformed independent component analysis and voxelwise univariate linear regression analysis in identifying and classifying relevant brain regions. We conclude that the graph-based visualizations of self-organizing maps help in advanced visualization of cluster boundaries in fMRI data enabling the separation of regions with small differences in the timings of their brain responses.
Rizvi, Sakina J; Quilty, Lena C; Sproule, Beth A; Cyriac, Anna; Michael Bagby, R; Kennedy, Sidney H
2015-09-30
Anhedonia, a core symptom of Major Depressive Disorder (MDD), is predictive of antidepressant non-response. In contrast to the definition of anhedonia as a "loss of pleasure", neuropsychological studies provide evidence for multiple facets of hedonic function. The aim of the current study was to develop and validate the Dimensional Anhedonia Rating Scale (DARS), a dynamic scale that measures desire, motivation, effort and consummatory pleasure across hedonic domains. Following item selection procedures and reliability testing using data from community participants (N=229) (Study 1), the 17-item scale was validated in an online study with community participants (N=150) (Study 2). The DARS was also validated in unipolar or bipolar depressed patients (n=52) and controls (n=50) (Study 3). Principal components analysis of the 17-item DARS revealed a 4-component structure mapping onto the domains of anhedonia: hobbies, food/drink, social activities, and sensory experience. Reliability of the DARS subscales was high across studies (Cronbach's α=0.75-0.92). The DARS also demonstrated good convergent and divergent validity. Hierarchical regression analysis revealed the DARS showed additional utility over the Snaith-Hamilton Pleasure Scale (SHAPS) in predicting reward function and distinguishing MDD subgroups. These studies provide support for the reliability and validity of the DARS. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Function approximation using combined unsupervised and supervised learning.
Andras, Peter
2014-03-01
Function approximation is one of the core tasks that are solved using neural networks in the context of many engineering problems. However, good approximation results need good sampling of the data space, which usually requires exponentially increasing volume of data as the dimensionality of the data increases. At the same time, often the high-dimensional data is arranged around a much lower dimensional manifold. Here we propose the breaking of the function approximation task for high-dimensional data into two steps: (1) the mapping of the high-dimensional data onto a lower dimensional space corresponding to the manifold on which the data resides and (2) the approximation of the function using the mapped lower dimensional data. We use over-complete self-organizing maps (SOMs) for the mapping through unsupervised learning, and single hidden layer neural networks for the function approximation through supervised learning. We also extend the two-step procedure by considering support vector machines and Bayesian SOMs for the determination of the best parameters for the nonlinear neurons in the hidden layer of the neural networks used for the function approximation. We compare the approximation performance of the proposed neural networks using a set of functions and show that indeed the neural networks using combined unsupervised and supervised learning outperform in most cases the neural networks that learn the function approximation using the original high-dimensional data.
Kee, Dohyung
2002-01-01
The purpose of this study was to develop a new method for analytically generating three-dimensional isocomfort workspace for the upper extremities using the robot kinematics. Subjective perceived discomfort scores in varying postures for manipulating four types of controls were used. Fifteen healthy male subjects participated in the experiment. The subjects were asked to hold the given postures manipulating controls for 60 s in the seated position, and to rate their perceived discomfort during the following rest of 60 s using the magnitude estimation. Postures of the upper extremities set by shoulder and elbow motions, types of controls, and left right hand were selected as experimental variables, in which the L32 orthogonal array was adopted. The results showed that shoulder flexion and adduction-abduction, elbow flexion, and types of controls significantly affected perceived discomfort for postures operating controls, but hand used for operating controls did not. Depending upon the types of controls, four regression models predicting perceived discomfort were presented. Using the models, a sweeping algorithm to generate three-dimensional isocomfort workspace was developed, in which the robot kinematics was employed to describe the translational relationships between the upper arm and the lower arm/hand. It is expected that the isocomfort workspace can be used as a valuable design guideline when ergonomically designing three-dimensional workplaces.
Kano, Yukiko; Kono, Toshiaki; Matsuda, Natsumi; Nonaka, Maiko; Kuwabara, Hitoshi; Shimada, Takafumi; Shishikura, Kurie; Konno, Chizue; Ohta, Masataka
2015-03-30
This study investigated the relationships between tics, obsessive-compulsive symptoms (OCS), and impulsivity, and their effects on global functioning in Japanese patients with Tourette syndrome (TS), using the dimensional approach for OCS. Fifty-three TS patients were assessed using the Yale Global Tic Severity Scale, the Dimensional Yale-Brown Obsessive-Compulsive Scale, the Impulsivity Rating Scale, and the Global Assessment of Functioning Scale. Although tic severity scores were significantly and positively correlated with OCS severity scores, impulsivity severity scores were not significantly correlated with either. The global functioning score was significantly and negatively correlated with tic and OCS severity scores. Of the 6 dimensional OCS scores, only aggression scores had a significant negative correlation with global functioning scores. A stepwise multiple regression analysis showed that only OCS severity scores were significantly associated with global functioning scores. Despite a moderate correlation between tic severity and OCS severity, the impact of OCS on global functioning was greater than that of tics. Of the OCS dimensions, only aggression had a significant impact on global functioning. Our findings suggest that it is important to examine OCS using a dimensional approach when analyzing global functioning in TS patients. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
PREDICTION OF SOLAR FLARE SIZE AND TIME-TO-FLARE USING SUPPORT VECTOR MACHINE REGRESSION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boucheron, Laura E.; Al-Ghraibah, Amani; McAteer, R. T. James
We study the prediction of solar flare size and time-to-flare using 38 features describing magnetic complexity of the photospheric magnetic field. This work uses support vector regression to formulate a mapping from the 38-dimensional feature space to a continuous-valued label vector representing flare size or time-to-flare. When we consider flaring regions only, we find an average error in estimating flare size of approximately half a geostationary operational environmental satellite (GOES) class. When we additionally consider non-flaring regions, we find an increased average error of approximately three-fourths a GOES class. We also consider thresholding the regressed flare size for the experimentmore » containing both flaring and non-flaring regions and find a true positive rate of 0.69 and a true negative rate of 0.86 for flare prediction. The results for both of these size regression experiments are consistent across a wide range of predictive time windows, indicating that the magnetic complexity features may be persistent in appearance long before flare activity. This is supported by our larger error rates of some 40 hr in the time-to-flare regression problem. The 38 magnetic complexity features considered here appear to have discriminative potential for flare size, but their persistence in time makes them less discriminative for the time-to-flare problem.« less
High dimensional feature reduction via projection pursuit
NASA Technical Reports Server (NTRS)
Jimenez, Luis; Landgrebe, David
1994-01-01
The recent development of more sophisticated remote sensing systems enables the measurement of radiation in many more spectral intervals than previously possible. An example of that technology is the AVIRIS system, which collects image data in 220 bands. As a result of this, new algorithms must be developed in order to analyze the more complex data effectively. Data in a high dimensional space presents a substantial challenge, since intuitive concepts valid in a 2-3 dimensional space to not necessarily apply in higher dimensional spaces. For example, high dimensional space is mostly empty. This results from the concentration of data in the corners of hypercubes. Other examples may be cited. Such observations suggest the need to project data to a subspace of a much lower dimension on a problem specific basis in such a manner that information is not lost. Projection Pursuit is a technique that will accomplish such a goal. Since it processes data in lower dimensions, it should avoid many of the difficulties of high dimensional spaces. In this paper, we begin the investigation of some of the properties of Projection Pursuit for this purpose.
Zand, Kevin A.; Shah, Amol; Heba, Elhamy; Wolfson, Tanya; Hamilton, Gavin; Lam, Jessica; Chen, Joshua; Hooker, Jonathan C.; Gamst, Anthony C.; Middleton, Michael S.; Schwimmer, Jeffrey B.; Sirlin, Claude B.
2015-01-01
Purpose To assess accuracy of magnitude-based magnetic resonance imaging (M-MRI) in children to estimate hepatic proton density fat fraction (PDFF) using two to six echoes, with magnetic resonance spectroscopy (MRS)-measured PDFF as a reference standard. Materials and Methods This was an IRB-approved, HIPAA-compliant, single-center, cross-sectional, retrospective analysis of data collected prospectively between 2008 and 2013 in children with known or suspected non-alcoholic fatty liver disease (NAFLD). Two hundred and eighty-six children (8 – 20 [mean 14.2 ± 2.5] yrs; 182 boys) underwent same-day MRS and M-MRI. Unenhanced two-dimensional axial spoiled gradient-recalled-echo images at six echo times were obtained at 3T after a single low-flip-angle (10°) excitation with ≥ 120-ms recovery time. Hepatic PDFF was estimated using the first two, three, four, five, and all six echoes. For each number of echoes, accuracy of M-MRI to estimate PDFF was assessed by linear regression with MRS-PDFF as reference standard. Accuracy metrics were regression intercept, slope, average bias, and R2. Results MRS-PDFF ranged from 0.2 – 40.4% (mean 13.1 ± 9.8%). Using three to six echoes, regression intercept, slope, and average bias were 0.46 – 0.96%, 0.99 – 1.01, and 0.57 – 0.89%, respectively. Using two echoes, these values were 2.98%, 0.97, and 2.72%, respectively. R2 ranged 0.98 – 0.99 for all methods. Conclusion Using three to six echoes, M-MRI has high accuracy for hepatic PDFF estimation in children. PMID:25847512
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press.
Zand, Kevin A; Shah, Amol; Heba, Elhamy; Wolfson, Tanya; Hamilton, Gavin; Lam, Jessica; Chen, Joshua; Hooker, Jonathan C; Gamst, Anthony C; Middleton, Michael S; Schwimmer, Jeffrey B; Sirlin, Claude B
2015-11-01
To assess accuracy of magnitude-based magnetic resonance imaging (M-MRI) in children to estimate hepatic proton density fat fraction (PDFF) using two to six echoes, with magnetic resonance spectroscopy (MRS) -measured PDFF as a reference standard. This was an IRB-approved, HIPAA-compliant, single-center, cross-sectional, retrospective analysis of data collected prospectively between 2008 and 2013 in children with known or suspected nonalcoholic fatty liver disease (NAFLD). Two hundred eighty-six children (8-20 [mean 14.2 ± 2.5] years; 182 boys) underwent same-day MRS and M-MRI. Unenhanced two-dimensional axial spoiled gradient-recalled-echo images at six echo times were obtained at 3T after a single low-flip-angle (10°) excitation with ≥ 120-ms recovery time. Hepatic PDFF was estimated using the first two, three, four, five, and all six echoes. For each number of echoes, accuracy of M-MRI to estimate PDFF was assessed by linear regression with MRS-PDFF as reference standard. Accuracy metrics were regression intercept, slope, average bias, and R(2) . MRS-PDFF ranged from 0.2-40.4% (mean 13.1 ± 9.8%). Using three to six echoes, regression intercept, slope, and average bias were 0.46-0.96%, 0.99-1.01, and 0.57-0.89%, respectively. Using two echoes, these values were 2.98%, 0.97, and 2.72%, respectively. R(2) ranged 0.98-0.99 for all methods. Using three to six echoes, M-MRI has high accuracy for hepatic PDFF estimation in children. © 2015 Wiley Periodicals, Inc.
Skin microrelief profiles as a cutaneous aging index.
Kim, Dai Hyun; Rhyu, Yeon Seung; Ahn, Hyo Hyun; Hwang, Eenjun; Uhm, Chang Sub
2016-10-01
An objective measurement of cutaneous topographical information is important for quantifying the degree of skin aging. Our aim was to improve methods for measuring microrelief patterns using a three-dimensional analysis based on silicone replicas and scanning electron microscope (SEM). Another objective was to compare the results with those obtained using a two-dimensional analysis method based on dermoscopy. Silicone replicas were obtained from forearms, dorsum of the hands and fingers of 51 volunteers. Cutaneous profiles obtained by SEM with silicone replicas showed more consistent correlations with age than data obtained by dermoscopy. This indicates the advantage of three-dimensional topography analysis using silicone replicas and SEM over the widely used dermoscopic assessment. The cutaneous age was calculated using stepwise linear regression, and the result was 57.40-9.47 × (number of furrows on dorsum of the hand) × (width of furrows on dorsum of the hand). © The Author 2016. Published by Oxford University Press on behalf of The Japanese Society of Microscopy. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Prediction of high-dimensional states subject to respiratory motion: a manifold learning approach
NASA Astrophysics Data System (ADS)
Liu, Wenyang; Sawant, Amit; Ruan, Dan
2016-07-01
The development of high-dimensional imaging systems in image-guided radiotherapy provides important pathways to the ultimate goal of real-time full volumetric motion monitoring. Effective motion management during radiation treatment usually requires prediction to account for system latency and extra signal/image processing time. It is challenging to predict high-dimensional respiratory motion due to the complexity of the motion pattern combined with the curse of dimensionality. Linear dimension reduction methods such as PCA have been used to construct a linear subspace from the high-dimensional data, followed by efficient predictions on the lower-dimensional subspace. In this study, we extend such rationale to a more general manifold and propose a framework for high-dimensional motion prediction with manifold learning, which allows one to learn more descriptive features compared to linear methods with comparable dimensions. Specifically, a kernel PCA is used to construct a proper low-dimensional feature manifold, where accurate and efficient prediction can be performed. A fixed-point iterative pre-image estimation method is used to recover the predicted value in the original state space. We evaluated and compared the proposed method with a PCA-based approach on level-set surfaces reconstructed from point clouds captured by a 3D photogrammetry system. The prediction accuracy was evaluated in terms of root-mean-squared-error. Our proposed method achieved consistent higher prediction accuracy (sub-millimeter) for both 200 ms and 600 ms lookahead lengths compared to the PCA-based approach, and the performance gain was statistically significant.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hohn, M.E.; McDowell, R.R.; Matchen, D.L.
1997-06-01
Since discovery in 1924, Granny Creek field in central West Virginia has experienced several periods of renewed drilling for oil in a fluvial-deltaic sandstone in the Lower Mississippian Price Formation. Depositional and diagenetic features leading to reservoir heterogeneity include highly variable grain size, thin shale and siltstone beds, and zones containing large quantities of calcite, siderite, or quartz cement. Electrofacies defined through cluster analysis of wireline log responses corresponded approximately to facies observed in core. Three-dimensional models of porosity computed from density logs showed that zones of relatively high porosity were discontinuous across the field. The regression of core permeabilitymore » on core porosity is statistically significant, and differs for each electrofacies. Zones of high permeability estimated from porosity and electrofacies tend to be discontinuous and aligned roughly north-south. Cumulative oil production varies considerably between adjacent wells, and corresponds very poorly with trends in porosity and permeability. Original oil in place, estimated for each well from reservoir thickness, porosity, water saturation, and an assumed value for drainage radius, is highly variable in the southern part of the field, which is characterized by relatively complex interfingering of electrofacies and similar variability in porosity and permeability.« less
Three-dimensional high-definition flow in the diagnosis of placental lakes.
Inubashiri, Eisuke; Deguchi, Keizou; Abe, Kiyotaka; Saitou, Atushi; Watanabe, Yukio; Akutagawa, Noriyuki; Kuroki, Katumaru; Sugawara, Masaki; Maeda, Nobuhiko
2014-10-01
Placental lakes are sonolucent areas often found in the normal placenta. Most of them are asymptomatic. They are sometimes related to placenta accreta or intrauterine fetal growth restriction, among other conditions. Although Doppler sonography is useful for evaluating noxious placental lakes, it is not easy to adapt Doppler studies to conventional two-dimensional color Doppler sonography because of the low-velocity blood flow and high vascularity in the placenta. Here, we demonstrate how three-dimensional high-definition imaging of flow provides a novel visual depiction of placental lakes, which helps substantially with the differential diagnosis. As far as we know, there have been no previous reports of observation of placental lakes using three-dimensional high-definition imaging of flow.
Uniform high order spectral methods for one and two dimensional Euler equations
NASA Technical Reports Server (NTRS)
Cai, Wei; Shu, Chi-Wang
1991-01-01
Uniform high order spectral methods to solve multi-dimensional Euler equations for gas dynamics are discussed. Uniform high order spectral approximations with spectral accuracy in smooth regions of solutions are constructed by introducing the idea of the Essentially Non-Oscillatory (ENO) polynomial interpolations into the spectral methods. The authors present numerical results for the inviscid Burgers' equation, and for the one dimensional Euler equations including the interactions between a shock wave and density disturbance, Sod's and Lax's shock tube problems, and the blast wave problem. The interaction between a Mach 3 two dimensional shock wave and a rotating vortex is simulated.
Accelerated High-Dimensional MR Imaging with Sparse Sampling Using Low-Rank Tensors
He, Jingfei; Liu, Qiegen; Christodoulou, Anthony G.; Ma, Chao; Lam, Fan
2017-01-01
High-dimensional MR imaging often requires long data acquisition time, thereby limiting its practical applications. This paper presents a low-rank tensor based method for accelerated high-dimensional MR imaging using sparse sampling. This method represents high-dimensional images as low-rank tensors (or partially separable functions) and uses this mathematical structure for sparse sampling of the data space and for image reconstruction from highly undersampled data. More specifically, the proposed method acquires two datasets with complementary sampling patterns, one for subspace estimation and the other for image reconstruction; image reconstruction from highly undersampled data is accomplished by fitting the measured data with a sparsity constraint on the core tensor and a group sparsity constraint on the spatial coefficients jointly using the alternating direction method of multipliers. The usefulness of the proposed method is demonstrated in MRI applications; it may also have applications beyond MRI. PMID:27093543
Kernel-based whole-genome prediction of complex traits: a review.
Morota, Gota; Gianola, Daniel
2014-01-01
Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
Brown, Steffen A.; Hall, Rebecca; Hund, Lauren; Gutierrez, Hilda L.; Hurley, Timothy; Holbrook, Bradley D.; Bakhireva, Ludmila N.
2017-01-01
Objective While prenatal 3D ultrasonography results in improved diagnostic accuracy, no data are available on biometric assessment of the fetal frontal lobe. This study was designed to assess feasibility of a standardized approach to biometric measurement of the fetal frontal lobe and to construct frontal lobe growth trajectories throughout gestation. Study Design A sonographic 3D volume set was obtained and measured in 101 patients between 16.1 and 33.7 gestational weeks. Measurements were obtained by two independent raters. To model the relationship between gestational age and each frontal lobe measurement, flexible linear regression models were fit using penalized regression splines. Results The sample contained an ethnically diverse population (7.9% Native Americans, 45.5% Hispanic/Latina). There was high inter-rater reliability (correlation coefficients: 0.95, 1.0, and 0.87 for frontal lobe length, width, and height; p-values < 0.001). Graphs of the growth trajectories and corresponding percentiles were estimated as a function of gestational age. The estimated rates of frontal lobe growth were 0.096 cm/week, 0.247 cm/week, and 0.111 cm/week for length, width, and height. Conclusion To our knowledge, this is the first study to examine fetal frontal lobe growth trajectories through 3D prenatal ultrasound examination. Such normative data will allow for future prenatal evaluation of a particular disease state by 3D ultrasound imaging. PMID:29075046
Brown, Steffen A; Hall, Rebecca; Hund, Lauren; Gutierrez, Hilda L; Hurley, Timothy; Holbrook, Bradley D; Bakhireva, Ludmila N
2017-01-01
While prenatal 3D ultrasonography results in improved diagnostic accuracy, no data are available on biometric assessment of the fetal frontal lobe. This study was designed to assess feasibility of a standardized approach to biometric measurement of the fetal frontal lobe and to construct frontal lobe growth trajectories throughout gestation. A sonographic 3D volume set was obtained and measured in 101 patients between 16.1 and 33.7 gestational weeks. Measurements were obtained by two independent raters. To model the relationship between gestational age and each frontal lobe measurement, flexible linear regression models were fit using penalized regression splines. The sample contained an ethnically diverse population (7.9% Native Americans, 45.5% Hispanic/Latina). There was high inter-rater reliability (correlation coefficients: 0.95, 1.0, and 0.87 for frontal lobe length, width, and height; p-values < 0.001). Graphs of the growth trajectories and corresponding percentiles were estimated as a function of gestational age. The estimated rates of frontal lobe growth were 0.096 cm/week, 0.247 cm/week, and 0.111 cm/week for length, width, and height. To our knowledge, this is the first study to examine fetal frontal lobe growth trajectories through 3D prenatal ultrasound examination. Such normative data will allow for future prenatal evaluation of a particular disease state by 3D ultrasound imaging.
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
Tian, Yuxi; Schuemie, Martijn J; Suchard, Marc A
2018-06-22
Propensity score adjustment is a popular approach for confounding control in observational studies. Reliable frameworks are needed to determine relative propensity score performance in large-scale studies, and to establish optimal propensity score model selection methods. We detail a propensity score evaluation framework that includes synthetic and real-world data experiments. Our synthetic experimental design extends the 'plasmode' framework and simulates survival data under known effect sizes, and our real-world experiments use a set of negative control outcomes with presumed null effect sizes. In reproductions of two published cohort studies, we compare two propensity score estimation methods that contrast in their model selection approach: L1-regularized regression that conducts a penalized likelihood regression, and the 'high-dimensional propensity score' (hdPS) that employs a univariate covariate screen. We evaluate methods on a range of outcome-dependent and outcome-independent metrics. L1-regularization propensity score methods achieve superior model fit, covariate balance and negative control bias reduction compared with the hdPS. Simulation results are mixed and fluctuate with simulation parameters, revealing a limitation of simulation under the proportional hazards framework. Including regularization with the hdPS reduces commonly reported non-convergence issues but has little effect on propensity score performance. L1-regularization incorporates all covariates simultaneously into the propensity score model and offers propensity score performance superior to the hdPS marginal screen.
Two-dimensional advective transport in ground-water flow parameter estimation
Anderman, E.R.; Hill, M.C.; Poeter, E.P.
1996-01-01
Nonlinear regression is useful in ground-water flow parameter estimation, but problems of parameter insensitivity and correlation often exist given commonly available hydraulic-head and head-dependent flow (for example, stream and lake gain or loss) observations. To address this problem, advective-transport observations are added to the ground-water flow, parameter-estimation model MODFLOWP using particle-tracking methods. The resulting model is used to investigate the importance of advective-transport observations relative to head-dependent flow observations when either or both are used in conjunction with hydraulic-head observations in a simulation of the sewage-discharge plume at Otis Air Force Base, Cape Cod, Massachusetts, USA. The analysis procedure for evaluating the probable effect of new observations on the regression results consists of two steps: (1) parameter sensitivities and correlations calculated at initial parameter values are used to assess the model parameterization and expected relative contributions of different types of observations to the regression; and (2) optimal parameter values are estimated by nonlinear regression and evaluated. In the Cape Cod parameter-estimation model, advective-transport observations did not significantly increase the overall parameter sensitivity; however: (1) inclusion of advective-transport observations decreased parameter correlation enough for more unique parameter values to be estimated by the regression; (2) realistic uncertainties in advective-transport observations had a small effect on parameter estimates relative to the precision with which the parameters were estimated; and (3) the regression results and sensitivity analysis provided insight into the dynamics of the ground-water flow system, especially the importance of accurate boundary conditions. In this work, advective-transport observations improved the calibration of the model and the estimation of ground-water flow parameters, and use of regression and related techniques produced significant insight into the physical system.
Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady
2017-09-01
Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wan, Zhong; Kazakov, Aleksandr; Manfra, Michael J; Pfeiffer, Loren N; West, Ken W; Rokhinson, Leonid P
2015-06-11
Search for Majorana fermions renewed interest in semiconductor-superconductor interfaces, while a quest for higher-order non-Abelian excitations demands formation of superconducting contacts to materials with fractionalized excitations, such as a two-dimensional electron gas in a fractional quantum Hall regime. Here we report induced superconductivity in high-mobility two-dimensional electron gas in gallium arsenide heterostructures and development of highly transparent semiconductor-superconductor ohmic contacts. Supercurrent with characteristic temperature dependence of a ballistic junction has been observed across 0.6 μm, a regime previously achieved only in point contacts but essential to the formation of well separated non-Abelian states. High critical fields (>16 T) in NbN contacts enables investigation of an interplay between superconductivity and strongly correlated states in a two-dimensional electron gas at high magnetic fields.
Wan, Zhong; Kazakov, Aleksandr; Manfra, Michael J.; Pfeiffer, Loren N.; West, Ken W.; Rokhinson, Leonid P.
2015-01-01
Search for Majorana fermions renewed interest in semiconductor–superconductor interfaces, while a quest for higher-order non-Abelian excitations demands formation of superconducting contacts to materials with fractionalized excitations, such as a two-dimensional electron gas in a fractional quantum Hall regime. Here we report induced superconductivity in high-mobility two-dimensional electron gas in gallium arsenide heterostructures and development of highly transparent semiconductor–superconductor ohmic contacts. Supercurrent with characteristic temperature dependence of a ballistic junction has been observed across 0.6 μm, a regime previously achieved only in point contacts but essential to the formation of well separated non-Abelian states. High critical fields (>16 T) in NbN contacts enables investigation of an interplay between superconductivity and strongly correlated states in a two-dimensional electron gas at high magnetic fields. PMID:26067452
Feature extraction and classification algorithms for high dimensional data
NASA Technical Reports Server (NTRS)
Lee, Chulhee; Landgrebe, David
1993-01-01
Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.
Sparse learning of stochastic dynamical equations
NASA Astrophysics Data System (ADS)
Boninsegna, Lorenzo; Nüske, Feliks; Clementi, Cecilia
2018-06-01
With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.
Dielectric Cytometry with Three-Dimensional Cellular Modeling
Katsumoto, Yoichi; Hayashi, Yoshihito; Oshige, Ikuya; Omori, Shinji; Kishii, Noriyuki; Yasuda, Akio; Asami, Koji
2008-01-01
We have developed what we believe is an efficient method to determine the electric parameters (the specific membrane capacitance Cm and the cytoplasm conductivity κi) of cells from their dielectric dispersion. First, a limited number of dispersion curves are numerically calculated for a three-dimensional cell model by changing Cm and κi, and their amplitudes Δɛ and relaxation times τ are determined by assuming a Cole-Cole function. Second, regression formulas are obtained from the values of Δɛ and τ and then used for the determination of Cm and κi from the experimental Δɛ and τ. This method was applied to the dielectric dispersion measured for rabbit erythrocytes (discocytes and echinocytes) and human erythrocytes (normocytes), and provided reasonable Cm and κi of the erythrocytes and excellent agreement between the theoretical and experimental dispersion curves. PMID:18567636
Dielectric cytometry with three-dimensional cellular modeling.
Katsumoto, Yoichi; Hayashi, Yoshihito; Oshige, Ikuya; Omori, Shinji; Kishii, Noriyuki; Yasuda, Akio; Asami, Koji
2008-09-15
We have developed what we believe is an efficient method to determine the electric parameters (the specific membrane capacitance C(m) and the cytoplasm conductivity kappa(i)) of cells from their dielectric dispersion. First, a limited number of dispersion curves are numerically calculated for a three-dimensional cell model by changing C(m) and kappa(i), and their amplitudes Deltaepsilon and relaxation times tau are determined by assuming a Cole-Cole function. Second, regression formulas are obtained from the values of Deltaepsilon and tau and then used for the determination of C(m) and kappa(i) from the experimental Deltaepsilon and tau. This method was applied to the dielectric dispersion measured for rabbit erythrocytes (discocytes and echinocytes) and human erythrocytes (normocytes), and provided reasonable C(m) and kappa(i) of the erythrocytes and excellent agreement between the theoretical and experimental dispersion curves.
Thermal Investigation of Three-Dimensional GaN-on-SiC High Electron Mobility Transistors
2017-07-01
AFRL-RY-WP-TR-2017-0143 THERMAL INVESTIGATION OF THREE- DIMENSIONAL GaN-on-SiC HIGH ELECTRON MOBILITY TRANSISTORS Qing Hao The University of Arizona...To) July 2017 Final 08 April 2015 – 10 April 2017 4. TITLE AND SUBTITLE THERMAL INVESTIGATION OF THREE-DIMENSIONAL GaN-on-SiC HIGH ELECTRON MOBILITY...used in many DoD applications, including integrated radio frequency (RF) amplifiers and power electronics . However, inherent inefficiencies in
Supporting Dynamic Quantization for High-Dimensional Data Analytics.
Guzun, Gheorghi; Canahuate, Guadalupe
2017-05-01
Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.
Amir, El-ad David; Davis, Kara L; Tadmor, Michelle D; Simonds, Erin F; Levine, Jacob H; Bendall, Sean C; Shenfeld, Daniel K; Krishnaswamy, Smita; Nolan, Garry P; Pe'er, Dana
2013-06-01
New high-dimensional, single-cell technologies offer unprecedented resolution in the analysis of heterogeneous tissues. However, because these technologies can measure dozens of parameters simultaneously in individual cells, data interpretation can be challenging. Here we present viSNE, a tool that allows one to map high-dimensional cytometry data onto two dimensions, yet conserve the high-dimensional structure of the data. viSNE plots individual cells in a visual similar to a scatter plot, while using all pairwise distances in high dimension to determine each cell's location in the plot. We integrated mass cytometry with viSNE to map healthy and cancerous bone marrow samples. Healthy bone marrow automatically maps into a consistent shape, whereas leukemia samples map into malformed shapes that are distinct from healthy bone marrow and from each other. We also use viSNE and mass cytometry to compare leukemia diagnosis and relapse samples, and to identify a rare leukemia population reminiscent of minimal residual disease. viSNE can be applied to any multi-dimensional single-cell technology.
Neural Connectivity Evidence for a Categorical-Dimensional Hybrid Model of Autism Spectrum Disorder.
Elton, Amanda; Di Martino, Adriana; Hazlett, Heather Cody; Gao, Wei
2016-07-15
Autism spectrum disorder (ASD) encompasses a complex manifestation of symptoms that include deficits in social interaction and repetitive or stereotyped interests and behaviors. In keeping with the increasing recognition of the dimensional characteristics of ASD symptoms and the categorical nature of a diagnosis, we sought to delineate the neural mechanisms of ASD symptoms based on the functional connectivity of four known neural networks (i.e., default mode network, dorsal attention network, salience network, and executive control network). We leveraged an open data resource (Autism Brain Imaging Data Exchange) providing resting-state functional magnetic resonance imaging data sets from 90 boys with ASD and 95 typically developing boys. This data set also included the Social Responsiveness Scale as a dimensional measure of ASD traits. Seed-based functional connectivity was paired with linear regression to identify functional connectivity abnormalities associated with categorical effects of ASD diagnosis, dimensional effects of ASD-like behaviors, and their interaction. Our results revealed the existence of dimensional mechanisms of ASD uniquely affecting each network based on the presence of connectivity-behavioral relationships; these were independent of diagnostic category. However, we also found evidence of categorical differences (i.e., diagnostic group differences) in connectivity strength for each network as well as categorical differences in connectivity-behavioral relationships (i.e., diagnosis-by-behavior interactions), supporting the coexistence of categorical mechanisms of ASD. Our findings support a hybrid model for ASD characterization that includes a combination of categorical and dimensional brain mechanisms and provide a novel understanding of the neural underpinnings of ASD. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Dyvorne, Hadrien; Knight-Greenfield, Ashley; Jajamovich, Guido; Besa, Cecilia; Cui, Yong; Stalder, Aurélien; Markl, Michael; Taouli, Bachir
2015-04-01
To develop a highly accelerated phase-contrast cardiac-gated volume flow measurement (four-dimensional [4D] flow) magnetic resonance (MR) imaging technique based on spiral sampling and dynamic compressed sensing and to compare this technique with established phase-contrast imaging techniques for the quantification of blood flow in abdominal vessels. This single-center prospective study was compliant with HIPAA and approved by the institutional review board. Ten subjects (nine men, one woman; mean age, 51 years; age range, 30-70 years) were enrolled. Seven patients had liver disease. Written informed consent was obtained from all participants. Two 4D flow acquisitions were performed in each subject, one with use of Cartesian sampling with respiratory tracking and the other with use of spiral sampling and a breath hold. Cartesian two-dimensional (2D) cine phase-contrast images were also acquired in the portal vein. Two observers independently assessed vessel conspicuity on phase-contrast three-dimensional angiograms. Quantitative flow parameters were measured by two independent observers in major abdominal vessels. Intertechnique concordance was quantified by using Bland-Altman and logistic regression analyses. There was moderate to substantial agreement in vessel conspicuity between 4D flow acquisitions in arteries and veins (κ = 0.71 and 0.61, respectively, for observer 1; κ = 0.71 and 0.44 for observer 2), whereas more artifacts were observed with spiral 4D flow (κ = 0.30 and 0.20). Quantitative measurements in abdominal vessels showed good equivalence between spiral and Cartesian 4D flow techniques (lower bound of the 95% confidence interval: 63%, 77%, 60%, and 64% for flow, area, average velocity, and peak velocity, respectively). For portal venous flow, spiral 4D flow was in better agreement with 2D cine phase-contrast flow (95% limits of agreement: -8.8 and 9.3 mL/sec, respectively) than was Cartesian 4D flow (95% limits of agreement: -10.6 and 14.6 mL/sec). The combination of highly efficient spiral sampling with dynamic compressed sensing results in major acceleration for 4D flow MR imaging, which allows comprehensive assessment of abdominal vessel hemodynamics in a single breath hold.
Toward a Periodic Table of Niches, or Exploring the Lizard Niche Hypervolume.
Pianka, Eric R; Vitt, Laurie J; Pelegrin, Nicolás; Fitzgerald, Daniel B; Winemiller, Kirk O
2017-11-01
Widespread niche convergence suggests that species can be organized according to functional trait combinations to create a framework analogous to a periodic table. We compiled ecological data for lizards to examine patterns of global and regional niche diversification, and we used multivariate statistical approaches to develop the beginnings for a periodic table of niches. Data (50+ variables) for five major niche dimensions (habitat, diet, life history, metabolism, defense) were compiled for 134 species of lizards representing 24 of the 38 extant families. Principal coordinates analyses were performed on niche dimensional data sets, and species scores for the first three axes were used as input for a principal components analysis to ordinate species in continuous niche space and for a regression tree analysis to separate species into discrete niche categories. Three-dimensional models facilitate exploration of species positions in relation to major gradients within the niche hypervolume. The first gradient loads on body size, foraging mode, and clutch size. The second was influenced by metabolism and terrestrial versus arboreal microhabitat. The third was influenced by activity time, life history, and diet. Natural dichotomies are activity time, foraging mode, parity mode, and habitat. Regression tree analysis identified 103 cases of extreme niche conservatism within clades and 100 convergences between clades. Extending this approach to other taxa should lead to a wider understanding of niche evolution.
Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua
2013-01-01
Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.
Kondo, Osamu; Suzuki, Koichi; Aoki, Yoshinori; Ishii, Norihisa
2018-01-01
Background Facial deformation as a sequela of leprosy is caused not only by a saddle nose but also by regression of the maxilla, as well documented in paleopathological observations of excavated skeletal remains of patients with leprosy. However, maxillary changes in living patients have been evaluated only by the subjective visual grading. Here, we attempted to evaluate maxillary bone deformation in patients with leprosy using three-dimensional computed tomography (3D-CT). Methods Three-dimensional images centered on the maxilla were reconstructed using multiplanar reconstruction methods in former patients with leprosy (n = 10) and control subjects (n = 5); the anterior-posterior length of the maxilla (MA-P) was then measured. The difference between the MA-P of the patients and those of controls was evaluated after compensating for individual skull size. These findings were also compared with those from previous paleopathological studies. Findings Three former patients with lepromatous leprosy showed marked atrophy of the maxilla at the prosthion (-8.6, -11.1 and -17.9 mm) which corresponded with the visual appearance of the maxillary deformity, and these results were consistent with paleopathological findings of excavated skeletal remains. Additionally, the precise bone defects of the maxilla could be individually calculated for accurate reconstructive surgery. Interpretation We have successfully illustrated maxillary bone deformities in living patients with leprosy. This study also confirmed the maxillary regression described in paleopathological studies. PMID:29522533
Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P
2010-03-19
This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Additivity Principle in High-Dimensional Deterministic Systems
NASA Astrophysics Data System (ADS)
Saito, Keiji; Dhar, Abhishek
2011-12-01
The additivity principle (AP), conjectured by Bodineau and Derrida [Phys. Rev. Lett. 92, 180601 (2004)PRLTAO0031-900710.1103/PhysRevLett.92.180601], is discussed for the case of heat conduction in three-dimensional disordered harmonic lattices to consider the effects of deterministic dynamics, higher dimensionality, and different transport regimes, i.e., ballistic, diffusive, and anomalous transport. The cumulant generating function (CGF) for heat transfer is accurately calculated and compared with the one given by the AP. In the diffusive regime, we find a clear agreement with the conjecture even if the system is high dimensional. Surprisingly, even in the anomalous regime the CGF is also well fitted by the AP. Lower-dimensional systems are also studied and the importance of three dimensionality for the validity is stressed.
Wang, Chen; Xu, Gui-Jun; Han, Zhe; Jiang, Xuan; Zhang, Cheng-Bao; Dong, Qiang; Ma, Jian-Xiong; Ma, Xin-Long
2015-11-01
The aim of the study was to introduce a new method for measuring the residual displacement of the femoral head after internal fixation and explore the relationship between residual displacement and osteonecrosis with femoral head, and to evaluate the risk factors associated with osteonecrosis of the femoral head in patients with femoral neck fractures treated by closed reduction and percutaneous cannulated screw fixation.One hundred and fifty patients who sustained intracapsular femoral neck fractures between January 2011 and April 2013 were enrolled in the study. All were treated with closed reduction and percutaneous cannulated screw internal fixation. The residual displacement of the femoral head after surgery was measured by 3-dimensional reconstruction that evaluated the quality of the reduction. Other data that might affect prognosis were also obtained from outpatient follow-up, telephone calls, or case reviews. Multivariate logistic regression analysis was applied to assess the intrinsic relationship between the risk factors and the osteonecrosis of the femoral head.Osteonecrosis of the femoral head occurred in 27 patients (18%). Significant differences were observed regarding the residual displacement of the femoral head and the preoperative Garden classification. Moreover, we found more or less residual displacement of femoral head in all patients with high quality of reduction based on x-ray by the new technique. There was a close relationship between residual displacement and ONFH.There exists limitation to evaluate the quality of reduction by x-ray. Three-dimensional reconstruction and digital measurement, as a new method, is a more accurate method to assess the quality of reduction. Residual displacement of the femoral head and the preoperative Garden classification were risk factors for osteonecrosis of the femoral head. High-quality reduction was necessary to avoid complications.
Spectral Reconstruction Based on Svm for Cross Calibration
NASA Astrophysics Data System (ADS)
Gao, H.; Ma, Y.; Liu, W.; He, H.
2017-05-01
Chinese HY-1C/1D satellites will use a 5nm/10nm-resolutional visible-near infrared(VNIR) hyperspectral sensor with the solar calibrator to cross-calibrate with other sensors. The hyperspectral radiance data are composed of average radiance in the sensor's passbands and bear a spectral smoothing effect, a transform from the hyperspectral radiance data to the 1-nm-resolution apparent spectral radiance by spectral reconstruction need to be implemented. In order to solve the problem of noise cumulation and deterioration after several times of iteration by the iterative algorithm, a novel regression method based on SVM is proposed, which can approach arbitrary complex non-linear relationship closely and provide with better generalization capability by learning. In the opinion of system, the relationship between the apparent radiance and equivalent radiance is nonlinear mapping introduced by spectral response function(SRF), SVM transform the low-dimensional non-linear question into high-dimensional linear question though kernel function, obtaining global optimal solution by virtue of quadratic form. The experiment is performed using 6S-simulated spectrums considering the SRF and SNR of the hyperspectral sensor, measured reflectance spectrums of water body and different atmosphere conditions. The contrastive result shows: firstly, the proposed method is with more reconstructed accuracy especially to the high-frequency signal; secondly, while the spectral resolution of the hyperspectral sensor reduces, the proposed method performs better than the iterative method; finally, the root mean square relative error(RMSRE) which is used to evaluate the difference of the reconstructed spectrum and the real spectrum over the whole spectral range is calculated, it decreses by one time at least by proposed method.
Surface-Sensitive Microwear Texture Analysis of Attrition and Erosion.
Ranjitkar, S; Turan, A; Mann, C; Gully, G A; Marsman, M; Edwards, S; Kaidonis, J A; Hall, C; Lekkas, D; Wetselaar, P; Brook, A H; Lobbezoo, F; Townsend, G C
2017-03-01
Scale-sensitive fractal analysis of high-resolution 3-dimensional surface reconstructions of wear patterns has advanced our knowledge in evolutionary biology, and has opened up opportunities for translatory applications in clinical practice. To elucidate the microwear characteristics of attrition and erosion in worn natural teeth, we scanned 50 extracted human teeth using a confocal profiler at a high optical resolution (X-Y, 0.17 µm; Z < 3 nm). Our hypothesis was that microwear complexity would be greater in erosion and that anisotropy would be greater in attrition. The teeth were divided into 4 groups, including 2 wear types (attrition and erosion) and 2 locations (anterior and posterior teeth; n = 12 for each anterior group, n = 13 for each posterior group) for 2 tissue types (enamel and dentine). The raw 3-dimensional data cloud was subjected to a newly developed rigorous standardization technique to reduce interscanner variability as well as to filter anomalous scanning data. Linear mixed effects (regression) analyses conducted separately for the dependent variables, complexity and anisotropy, showed the following effects of the independent variables: significant interactions between wear type and tissue type ( P = 0.0157 and P = 0.0003, respectively) and significant effects of location ( P < 0.0001 and P = 0.0035, respectively). There were significant associations between complexity and anisotropy when the dependent variable was either complexity ( P = 0.0003) or anisotropy ( P = 0.0014). Our findings of greater complexity in erosion and greater anisotropy in attrition confirm our hypothesis. The greatest geometric means were noted in dentine erosion for complexity and dentine attrition for anisotropy. Dentine also exhibited microwear characteristics that were more consistent with wear types than enamel. Overall, our findings could complement macrowear assessment in dental clinical practice and research and could assist in the early detection and management of pathologic tooth wear.
Hierarchical Discriminant Analysis.
Lu, Di; Ding, Chuntao; Xu, Jinliang; Wang, Shangguang
2018-01-18
The Internet of Things (IoT) generates lots of high-dimensional sensor intelligent data. The processing of high-dimensional data (e.g., data visualization and data classification) is very difficult, so it requires excellent subspace learning algorithms to learn a latent subspace to preserve the intrinsic structure of the high-dimensional data, and abandon the least useful information in the subsequent processing. In this context, many subspace learning algorithms have been presented. However, in the process of transforming the high-dimensional data into the low-dimensional space, the huge difference between the sum of inter-class distance and the sum of intra-class distance for distinct data may cause a bias problem. That means that the impact of intra-class distance is overwhelmed. To address this problem, we propose a novel algorithm called Hierarchical Discriminant Analysis (HDA). It minimizes the sum of intra-class distance first, and then maximizes the sum of inter-class distance. This proposed method balances the bias from the inter-class and that from the intra-class to achieve better performance. Extensive experiments are conducted on several benchmark face datasets. The results reveal that HDA obtains better performance than other dimensionality reduction algorithms.
High-dimensional quantum cloning and applications to quantum hacking
Bouchard, Frédéric; Fickler, Robert; Boyd, Robert W.; Karimi, Ebrahim
2017-01-01
Attempts at cloning a quantum system result in the introduction of imperfections in the state of the copies. This is a consequence of the no-cloning theorem, which is a fundamental law of quantum physics and the backbone of security for quantum communications. Although perfect copies are prohibited, a quantum state may be copied with maximal accuracy via various optimal cloning schemes. Optimal quantum cloning, which lies at the border of the physical limit imposed by the no-signaling theorem and the Heisenberg uncertainty principle, has been experimentally realized for low-dimensional photonic states. However, an increase in the dimensionality of quantum systems is greatly beneficial to quantum computation and communication protocols. Nonetheless, no experimental demonstration of optimal cloning machines has hitherto been shown for high-dimensional quantum systems. We perform optimal cloning of high-dimensional photonic states by means of the symmetrization method. We show the universality of our technique by conducting cloning of numerous arbitrary input states and fully characterize our cloning machine by performing quantum state tomography on cloned photons. In addition, a cloning attack on a Bennett and Brassard (BB84) quantum key distribution protocol is experimentally demonstrated to reveal the robustness of high-dimensional states in quantum cryptography. PMID:28168219
High-dimensional quantum cloning and applications to quantum hacking.
Bouchard, Frédéric; Fickler, Robert; Boyd, Robert W; Karimi, Ebrahim
2017-02-01
Attempts at cloning a quantum system result in the introduction of imperfections in the state of the copies. This is a consequence of the no-cloning theorem, which is a fundamental law of quantum physics and the backbone of security for quantum communications. Although perfect copies are prohibited, a quantum state may be copied with maximal accuracy via various optimal cloning schemes. Optimal quantum cloning, which lies at the border of the physical limit imposed by the no-signaling theorem and the Heisenberg uncertainty principle, has been experimentally realized for low-dimensional photonic states. However, an increase in the dimensionality of quantum systems is greatly beneficial to quantum computation and communication protocols. Nonetheless, no experimental demonstration of optimal cloning machines has hitherto been shown for high-dimensional quantum systems. We perform optimal cloning of high-dimensional photonic states by means of the symmetrization method. We show the universality of our technique by conducting cloning of numerous arbitrary input states and fully characterize our cloning machine by performing quantum state tomography on cloned photons. In addition, a cloning attack on a Bennett and Brassard (BB84) quantum key distribution protocol is experimentally demonstrated to reveal the robustness of high-dimensional states in quantum cryptography.
Inverse finite-size scaling for high-dimensional significance analysis
NASA Astrophysics Data System (ADS)
Xu, Yingying; Puranen, Santeri; Corander, Jukka; Kabashima, Yoshiyuki
2018-06-01
We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 1010 parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.
How many stakes are required to measure the mass balance of a glacier?
Fountain, A.G.; Vecchia, A.
1999-01-01
Glacier mass balance is estimated for South Cascade Glacier and Maclure Glacier using a one-dimensional regression of mass balance with altitude as an alternative to the traditional approach of contouring mass balance values. One attractive feature of regression is that it can be applied to sparse data sets where contouring is not possible and can provide an objective error of the resulting estimate. Regression methods yielded mass balance values equivalent to contouring methods. The effect of the number of mass balance measurements on the final value for the glacier showed that sample sizes as small as five stakes provided reasonable estimates, although the error estimates were greater than for larger sample sizes. Different spatial patterns of measurement locations showed no appreciable influence on the final value as long as different surface altitudes were intermittently sampled over the altitude range of the glacier. Two different regression equations were examined, a quadratic, and a piecewise linear spline, and comparison of results showed little sensitivity to the type of equation. These results point to the dominant effect of the gradient of mass balance with altitude of alpine glaciers compared to transverse variations. The number of mass balance measurements required to determine the glacier balance appears to be scale invariant for small glaciers and five to ten stakes are sufficient.
NASA Technical Reports Server (NTRS)
Chan, S. T. K.; Lee, C. H.; Brashears, M. R.
1975-01-01
A finite element algorithm for solving unsteady, three-dimensional high velocity impact problems is presented. A computer program was developed based on the Eulerian hydroelasto-viscoplastic formulation and the utilization of the theorem of weak solutions. The equations solved consist of conservation of mass, momentum, and energy, equation of state, and appropriate constitutive equations. The solution technique is a time-dependent finite element analysis utilizing three-dimensional isoparametric elements, in conjunction with a generalized two-step time integration scheme. The developed code was demonstrated by solving one-dimensional as well as three-dimensional impact problems for both the inviscid hydrodynamic model and the hydroelasto-viscoplastic model.
Kelleher, John D; Ross, Robert J; Sloan, Colm; Mac Namee, Brian
2011-02-01
Although data-driven spatial template models provide a practical and cognitively motivated mechanism for characterizing spatial term meaning, the influence of perceptual rather than solely geometric and functional properties has yet to be systematically investigated. In the light of this, in this paper, we investigate the effects of the perceptual phenomenon of object occlusion on the semantics of projective terms. We did this by conducting a study to test whether object occlusion had a noticeable effect on the acceptance values assigned to projective terms with respect to a 2.5-dimensional visual stimulus. Based on the data collected, a regression model was constructed and presented. Subsequent analysis showed that the regression model that included the occlusion factor outperformed an adaptation of Regier & Carlson's well-regarded AVS model for that same spatial configuration.
NASA Astrophysics Data System (ADS)
Hua, Yi-Lin; Zhou, Zong-Quan; Liu, Xiao; Yang, Tian-Shu; Li, Zong-Feng; Li, Pei-Yun; Chen, Geng; Xu, Xiao-Ye; Tang, Jian-Shun; Xu, Jin-Shi; Li, Chuan-Feng; Guo, Guang-Can
2018-01-01
A photon pair can be entangled in many degrees of freedom such as polarization, time bins, and orbital angular momentum (OAM). Among them, the OAM of photons can be entangled in an infinite-dimensional Hilbert space which enhances the channel capacity of sharing information in a network. Twisted photons generated by spontaneous parametric down-conversion offer an opportunity to create this high-dimensional entanglement, but a photon pair generated by this process is typically wideband, which makes it difficult to interface with the quantum memories in a network. Here we propose an annual-ring-type quasi-phase-matching (QPM) crystal for generation of the narrowband high-dimensional entanglement. The structure of the QPM crystal is designed by tracking the geometric divergences of the OAM modes that comprise the entangled state. The dimensionality and the quality of the entanglement can be greatly enhanced with the annual-ring-type QPM crystal.
Data center thermal management
Hamann, Hendrik F.; Li, Hongfei
2016-02-09
Historical high-spatial-resolution temperature data and dynamic temperature sensor measurement data may be used to predict temperature. A first formulation may be derived based on the historical high-spatial-resolution temperature data for determining a temperature at any point in 3-dimensional space. The dynamic temperature sensor measurement data may be calibrated based on the historical high-spatial-resolution temperature data at a corresponding historical time. Sensor temperature data at a plurality of sensor locations may be predicted for a future time based on the calibrated dynamic temperature sensor measurement data. A three-dimensional temperature spatial distribution associated with the future time may be generated based on the forecasted sensor temperature data and the first formulation. The three-dimensional temperature spatial distribution associated with the future time may be projected to a two-dimensional temperature distribution, and temperature in the future time for a selected space location may be forecasted dynamically based on said two-dimensional temperature distribution.
NASA Astrophysics Data System (ADS)
Xin, Shengchang; Yang, Na; Gao, Fei; Zhao, Jing; Li, Liang; Teng, Chao
2017-08-01
Three-dimensional carbon nanotube frameworks have been prepared via pyrolysis of polypyrrole nanotube aerogels that are synthesized by the simultaneous self-degraded template synthesis and hydrogel assembly followed by freeze-drying. The microstructure and composition of the materials are investigated by thermal gravimetric analysis, Raman spectrum, X-ray photoelectron spectroscopy, transmission electron microscopy, and specific surface analyzer. The results confirm the formation of three-dimensional carbon nanotube frameworks with low density, high mechanical properties, and high specific surface area. Compared with PPy aerogel precursor, the as-prepared three-dimensional carbon nanotube frameworks exhibit outstanding adsorption capacity towards organic dyes. Moreover, electrochemical tests show that the products possess high specific capacitance, good rate capability and excellent cycling performance with no capacitance loss over 1000 cycles. These characteristics collectively indicate the potential of three-dimensional polypyrrole-derived carbon nanotube framework as a promising macroscopic device for the applications in environmental and energy storages.
Data Visualization for ESM and ELINT: Visualizing 3D and Hyper Dimensional Data
2011-06-01
technique to present multiple 2D views was devised by D. Asimov . He assembled multiple two dimensional scatter plot views of the hyper dimensional...Viewing Multidimensional Data”, D. Asimov , DIAM Journal on Scientific and Statistical Computing, vol.61, pp.128-143, 1985. [2] “High-Dimensional
High-Fidelity Real-Time Simulation on Deployed Platforms
2010-08-26
three–dimensional transient heat conduction “ Swiss Cheese ” problem; and a three–dimensional unsteady incompressible Navier- Stokes low–Reynolds–number...our approach with three examples: a two?dimensional Helmholtz acoustics ?horn? problem; a three?dimensional transient heat conduction ? Swiss Cheese ...solutions; a transient lin- ear heat conduction problem in a three–dimensional “ Swiss Cheese ” configuration Ω — to illustrate treat- ment of many
In situ detection of tree root distribution and biomass by multi-electrode resistivity imaging.
Amato, Mariana; Basso, Bruno; Celano, Giuseppe; Bitella, Giovanni; Morelli, Gianfranco; Rossi, Roberta
2008-10-01
Traditional methods for studying tree roots are destructive and labor intensive, but available nondestructive techniques are applicable only to small scale studies or are strongly limited by soil conditions and root size. Soil electrical resistivity measured by geoelectrical methods has the potential to detect belowground plant structures, but quantitative relationships of these measurements with root traits have not been assessed. We tested the ability of two-dimensional (2-D) DC resistivity tomography to detect the spatial variability of roots and to quantify their biomass in a tree stand. A high-resolution resistivity tomogram was generated along a 11.75 m transect under an Alnus glutinosa (L.) Gaertn. stand based on an alpha-Wenner configuration with 48 electrodes spaced 0.25 m apart. Data were processed by a 2-D finite-element inversion algorithm, and corrected for soil temperature. Data acquisition, inversion and imaging were completed in the field within 60 min. Root dry mass per unit soil volume (root mass density, RMD) was measured destructively on soil samples collected to a depth of 1.05 m. Soil sand, silt, clay and organic matter contents, electrical conductivity, water content and pH were measured on a subset of samples. The spatial pattern of soil resistivity closely matched the spatial distribution of RMD. Multiple linear regression showed that only RMD and soil water content were related to soil resistivity along the transect. Regression analysis of RMD against soil resistivity revealed a highly significant logistic relationship (n = 97), which was confirmed on a separate dataset (n = 67), showing that soil resistivity was quantitatively related to belowground tree root biomass. This relationship provides a basis for developing quick nondestructive methods for detecting root distribution and quantifying root biomass, as well as for optimizing sampling strategies for studying root-driven phenomena.
NASA Astrophysics Data System (ADS)
Salawu, Emmanuel Oluwatobi; Hesse, Evelyn; Stopford, Chris; Davey, Neil; Sun, Yi
2017-11-01
Better understanding and characterization of cloud particles, whose properties and distributions affect climate and weather, are essential for the understanding of present climate and climate change. Since imaging cloud probes have limitations of optical resolution, especially for small particles (with diameter < 25 μm), instruments like the Small Ice Detector (SID) probes, which capture high-resolution spatial light scattering patterns from individual particles down to 1 μm in size, have been developed. In this work, we have proposed a method using Machine Learning techniques to estimate simulated particles' orientation-averaged projected sizes (PAD) and aspect ratio from their 2D scattering patterns. The two-dimensional light scattering patterns (2DLSP) of hexagonal prisms are computed using the Ray Tracing with Diffraction on Facets (RTDF) model. The 2DLSP cover the same angular range as the SID probes. We generated 2DLSP for 162 hexagonal prisms at 133 orientations for each. In a first step, the 2DLSP were transformed into rotation-invariant Zernike moments (ZMs), which are particularly suitable for analyses of pattern symmetry. Then we used ZMs, summed intensities, and root mean square contrast as inputs to the advanced Machine Learning methods. We created one random forests classifier for predicting prism orientation, 133 orientation-specific (OS) support vector classification models for predicting the prism aspect-ratios, 133 OS support vector regression models for estimating prism sizes, and another 133 OS Support Vector Regression (SVR) models for estimating the size PADs. We have achieved a high accuracy of 0.99 in predicting prism aspect ratios, and a low value of normalized mean square error of 0.004 for estimating the particle's size and size PADs.
D'Iorio, Alfonsina; Vitale, Carmine; Piscopo, Fausta; Baiano, Chiara; Falanga, Anna Paola; Longo, Katia; Amboni, Marianna; Barone, Paolo; Santangelo, Gabriella
2017-10-01
Parkinson's disease (PD) is characterized by a wide spectrum of non-motor symptoms that may impact negatively on the activities of the patient's daily life and reduce Health-related quality of life (HRQoL). The present study explored the impact of specific non-motor symptoms on the HRQoL in PD. Eighty-four outpatients underwent the Montreal Cognitive Assessment (MoCA) assessing global functioning and several questionnaires to assess depression, apathy, impulse control disorders (ICD), anxiety, anhedonia and functional impact of cognitive impairment. The perceived QoL was assessed by Parkinson's Disease Questionnaire (PDQ-8). The PD sample was divided into patients with high and low HRQoL around the median of PDQ-8 and compared on clinical features, cognitive and neuropsychiatric variables. A linear regression analysis, in which the global functioning, apathy, depression, anxiety, anhedonia, ICD and the functional autonomy scores were entered as independent variables and PDQ-8 score as dependent variable, was applied. Patients with lower HRQoL were more depressed, apathetic, anxious and showed more severe reduction of functional autonomy and global functioning than patients with high HRQoL. The regression analysis revealed that higher level of anxiety, executive apathy and more reduced functional autonomy were significantly associated with higher score on PDQ-8. The finding indicated that anxiety, apathy associated with impaired planning, attention and organization (i.e., executive apathy evaluated by the Dimensional Apathy Scale) and reduced functional autonomy contribute significantly to reduce the HRQoL in PD. Therefore, early identification and management of these neuropsychiatric symptoms should be relevant to preserve HRQoL in PD. Copyright © 2017 Elsevier Ltd. All rights reserved.