Sample records for sparse regression application

  1. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Deep ensemble learning of sparse regression models for brain disease diagnosis

    PubMed Central

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2018-01-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394

  3. Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

    2017-12-01

    Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.

  4. Incorporating biological information in sparse principal component analysis with application to genomic data.

    PubMed

    Li, Ziyi; Safo, Sandra E; Long, Qi

    2017-07-11

    Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.

  5. Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.

    PubMed

    Gijsberts, Arjan; Metta, Giorgio

    2013-05-01

    Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

    PubMed Central

    Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

    2010-01-01

    We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139

  7. Sparse partial least squares regression for simultaneous dimension reduction and variable selection

    PubMed Central

    Chun, Hyonho; Keleş, Sündüz

    2010-01-01

    Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611

  8. Temporally-Constrained Group Sparse Learning for Longitudinal Data Analysis in Alzheimer’s Disease

    PubMed Central

    Jie, Biao; Liu, Mingxia; Liu, Jun

    2016-01-01

    Sparse learning has been widely investigated for analysis of brain images to assist the diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, most existing sparse learning-based studies only adopt cross-sectional analysis methods, where the sparse model is learned using data from a single time-point. Actually, multiple time-points of data are often available in brain imaging applications, which can be used in some longitudinal analysis methods to better uncover the disease progression patterns. Accordingly, in this paper we propose a novel temporally-constrained group sparse learning method aiming for longitudinal analysis with multiple time-points of data. Specifically, we learn a sparse linear regression model by using the imaging data from multiple time-points, where a group regularization term is first employed to group the weights for the same brain region across different time-points together. Furthermore, to reflect the smooth changes between data derived from adjacent time-points, we incorporate two smoothness regularization terms into the objective function, i.e., one fused smoothness term which requires that the differences between two successive weight vectors from adjacent time-points should be small, and another output smoothness term which requires the differences between outputs of two successive models from adjacent time-points should also be small. We develop an efficient optimization algorithm to solve the proposed objective function. Experimental results on ADNI database demonstrate that, compared with conventional sparse learning-based methods, our proposed method can achieve improved regression performance and also help in discovering disease-related biomarkers. PMID:27093313

  9. Identifying predictive features in drug response using machine learning: opportunities and challenges.

    PubMed

    Vidyasagar, Mathukumalli

    2015-01-01

    This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

  10. Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Tang, Kunkun; Congedo, Pietro; Abgrall, Remi

    2014-11-01

    Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.

  11. A robust and efficient stepwise regression method for building sparse polynomial chaos expansions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be; Raisee, Mehrdad; Ghorbaniasl, Ghader

    2017-03-01

    Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selectionmore » criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.« less

  12. STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.

    PubMed

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2014-06-01

    Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.

  13. STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION

    PubMed Central

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2014-01-01

    Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560

  14. Optimal Sparse Upstream Sensor Placement for Hydrokinetic Turbines

    NASA Astrophysics Data System (ADS)

    Cavagnaro, Robert; Strom, Benjamin; Ross, Hannah; Hill, Craig; Polagye, Brian

    2016-11-01

    Accurate measurement of the flow field incident upon a hydrokinetic turbine is critical for performance evaluation during testing and setting boundary conditions in simulation. Additionally, turbine controllers may leverage real-time flow measurements. Particle image velocimetry (PIV) is capable of rendering a flow field over a wide spatial domain in a controlled, laboratory environment. However, PIV's lack of suitability for natural marine environments, high cost, and intensive post-processing diminish its potential for control applications. Conversely, sensors such as acoustic Doppler velocimeters (ADVs), are designed for field deployment and real-time measurement, but over a small spatial domain. Sparsity-promoting regression analysis such as LASSO is utilized to improve the efficacy of point measurements for real-time applications by determining optimal spatial placement for a small number of ADVs using a training set of PIV velocity fields and turbine data. The study is conducted in a flume (0.8 m2 cross-sectional area, 1 m/s flow) with laboratory-scale axial and cross-flow turbines. Predicted turbine performance utilizing the optimal sparse sensor network and associated regression model is compared to actual performance with corresponding PIV measurements.

  15. Regression analysis of sparse asynchronous longitudinal data.

    PubMed

    Cao, Hongyuan; Zeng, Donglin; Fine, Jason P

    2015-09-01

    We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.

  16. Nonconvex Sparse Logistic Regression With Weakly Convex Regularization

    NASA Astrophysics Data System (ADS)

    Shen, Xinyue; Gu, Yuantao

    2018-06-01

    In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.

  17. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  18. Discriminative spatial-frequency-temporal feature extraction and classification of motor imagery EEG: An sparse regression and Weighted Naïve Bayesian Classifier-based approach.

    PubMed

    Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Changsen; Liu, Feixiang

    2017-02-15

    Common spatial pattern (CSP) is most widely used in motor imagery based brain-computer interface (BCI) systems. In conventional CSP algorithm, pairs of the eigenvectors corresponding to both extreme eigenvalues are selected to construct the optimal spatial filter. In addition, an appropriate selection of subject-specific time segments and frequency bands plays an important role in its successful application. This study proposes to optimize spatial-frequency-temporal patterns for discriminative feature extraction. Spatial optimization is implemented by channel selection and finding discriminative spatial filters adaptively on each time-frequency segment. A novel Discernibility of Feature Sets (DFS) criteria is designed for spatial filter optimization. Besides, discriminative features located in multiple time-frequency segments are selected automatically by the proposed sparse time-frequency segment common spatial pattern (STFSCSP) method which exploits sparse regression for significant features selection. Finally, a weight determined by the sparse coefficient is assigned for each selected CSP feature and we propose a Weighted Naïve Bayesian Classifier (WNBC) for classification. Experimental results on two public EEG datasets demonstrate that optimizing spatial-frequency-temporal patterns in a data-driven manner for discriminative feature extraction greatly improves the classification performance. The proposed method gives significantly better classification accuracies in comparison with several competing methods in the literature. The proposed approach is a promising candidate for future BCI systems. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Exhaustive Search for Sparse Variable Selection in Linear Regression

    NASA Astrophysics Data System (ADS)

    Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

    2018-04-01

    We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

  20. Sparse modeling of spatial environmental variables associated with asthma

    PubMed Central

    Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.

    2014-01-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437

  1. Sparse modeling of spatial environmental variables associated with asthma.

    PubMed

    Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W

    2015-02-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluid simulation

    NASA Astrophysics Data System (ADS)

    Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi

    2016-06-01

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.

  3. Regression analysis of sparse asynchronous longitudinal data

    PubMed Central

    Cao, Hongyuan; Zeng, Donglin; Fine, Jason P.

    2015-01-01

    Summary We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus. PMID:26568699

  4. Functional brain networks reconstruction using group sparsity-regularized learning.

    PubMed

    Zhao, Qinghua; Li, Will X Y; Jiang, Xi; Lv, Jinglei; Lu, Jianfeng; Liu, Tianming

    2018-06-01

    Investigating functional brain networks and patterns using sparse representation of fMRI data has received significant interests in the neuroimaging community. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. To date, most of data-driven network reconstruction approaches rarely take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks utilizing the structure guided group sparse regression (S2GSR) in which 116 anatomical regions from the AAL template, as prior knowledge, are employed to guide the network reconstruction when performing sparse representation of whole-brain fMRI data. Specifically, we extract fMRI signals from standard space aligned with the AAL template. Then by learning a global over-complete dictionary, with the learned dictionary as a set of features (regressors), the group structured regression employs anatomical structures as group information to regress whole brain signals. Finally, the decomposition coefficients matrix is mapped back to the brain volume to represent functional brain networks and patterns. We use the publicly available Human Connectome Project (HCP) Q1 dataset as the test bed, and the experimental results indicate that the proposed anatomically guided structure sparse representation is effective in reconstructing concurrent functional brain networks.

  5. Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

    NASA Astrophysics Data System (ADS)

    Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

    2013-10-01

    Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.

  6. Optimizing spatial patterns with sparse filter bands for motor-imagery based brain-computer interface.

    PubMed

    Zhang, Yu; Zhou, Guoxu; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej

    2015-11-30

    Common spatial pattern (CSP) has been most popularly applied to motor-imagery (MI) feature extraction for classification in brain-computer interface (BCI) application. Successful application of CSP depends on the filter band selection to a large degree. However, the most proper band is typically subject-specific and can hardly be determined manually. This study proposes a sparse filter band common spatial pattern (SFBCSP) for optimizing the spatial patterns. SFBCSP estimates CSP features on multiple signals that are filtered from raw EEG data at a set of overlapping bands. The filter bands that result in significant CSP features are then selected in a supervised way by exploiting sparse regression. A support vector machine (SVM) is implemented on the selected features for MI classification. Two public EEG datasets (BCI Competition III dataset IVa and BCI Competition IV IIb) are used to validate the proposed SFBCSP method. Experimental results demonstrate that SFBCSP help improve the classification performance of MI. The optimized spatial patterns by SFBCSP give overall better MI classification accuracy in comparison with several competing methods. The proposed SFBCSP is a potential method for improving the performance of MI-based BCI. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Sparse Regression as a Sparse Eigenvalue Problem

    NASA Technical Reports Server (NTRS)

    Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai

    2008-01-01

    We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization

  8. Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

    PubMed

    Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K

    2017-01-01

    The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.

  9. Classical Testing in Functional Linear Models.

    PubMed

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.

  10. Classical Testing in Functional Linear Models

    PubMed Central

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155

  11. HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

    PubMed Central

    Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

    2015-01-01

    In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645

  12. Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood

    PubMed Central

    Yan, Fang-Rong; Lin, Jin-Guan; Liu, Yu

    2011-01-01

    The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD) penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA) and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO) penalty, by using receiver operating characteristic (ROC) with bayesian bootstrap estimating area under the curve (AUC) diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA) in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis. PMID:21716672

  13. Association between Stereotactic Radiotherapy and Death from Brain Metastases of Epithelial Ovarian Cancer: a Gliwice Data Re-Analysis with Penalization

    PubMed

    Tukiendorf, Andrzej; Mansournia, Mohammad Ali; Wydmański, Jerzy; Wolny-Rokicka, Edyta

    2017-04-01

    Background: Clinical datasets for epithelial ovarian cancer brain metastatic patients are usually small in size. When adequate case numbers are lacking, resulting estimates of regression coefficients may demonstrate bias. One of the direct approaches to reduce such sparse-data bias is based on penalized estimation. Methods: A re- analysis of formerly reported hazard ratios in diagnosed patients was performed using penalized Cox regression with a popular SAS package providing additional software codes for a statistical computational procedure. Results: It was found that the penalized approach can readily diminish sparse data artefacts and radically reduce the magnitude of estimated regression coefficients. Conclusions: It was confirmed that classical statistical approaches may exaggerate regression estimates or distort study interpretations and conclusions. The results support the thesis that penalization via weak informative priors and data augmentation are the safest approaches to shrink sparse data artefacts frequently occurring in epidemiological research. Creative Commons Attribution License

  14. Improved statistical power with a sparse shape model in detecting an aging effect in the hippocampus and amygdala

    NASA Astrophysics Data System (ADS)

    Chung, Moo K.; Kim, Seung-Goo; Schaefer, Stacey M.; van Reekum, Carien M.; Peschke-Schmitz, Lara; Sutterer, Matthew J.; Davidson, Richard J.

    2014-03-01

    The sparse regression framework has been widely used in medical image processing and analysis. However, it has been rarely used in anatomical studies. We present a sparse shape modeling framework using the Laplace- Beltrami (LB) eigenfunctions of the underlying shape and show its improvement of statistical power. Tradition- ally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes as a form of Fourier descriptors. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we present a LB-based method to filter out only the significant eigenfunctions by imposing a sparse penalty. For dense anatomical data such as deformation fields on a surface mesh, the sparse regression behaves like a smoothing process, which will reduce the error of incorrectly detecting false negatives. Hence the statistical power improves. The sparse shape model is then applied in investigating the influence of age on amygdala and hippocampus shapes in the normal population. The advantage of the LB sparse framework is demonstrated by showing the increased statistical power.

  15. A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

    NASA Astrophysics Data System (ADS)

    Gillis, Nicolas; Luce, Robert

    2018-01-01

    A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.

  16. Sparse brain network using penalized linear regression

    NASA Astrophysics Data System (ADS)

    Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.

    2011-03-01

    Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.

  17. Application of a sparseness constraint in multivariate curve resolution - Alternating least squares.

    PubMed

    Hugelier, Siewert; Piqueras, Sara; Bedia, Carmen; de Juan, Anna; Ruckebusch, Cyril

    2018-02-13

    The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is, above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a constraint in multivariate curve resolution - alternating least squares (MCR-ALS), which aims at reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the case in some instrumental responses, such as mass spectra, or in concentration profiles linked to scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression framework with L 0 -norm penalty is applied. This L 0 -norm penalty constrains the number of non-null coefficients in the fit of the array constrained without having an a priori on the number and their positions. It has been shown that the sparseness constraint induces the suppression of values linked to uninformative channels and noise in MS spectra and improves the location of scattered compounds in distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart matrix of the MCR bilinear model. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Estimating the size of an open population using sparse capture-recapture data.

    PubMed

    Huggins, Richard; Stoklosa, Jakub; Roach, Cameron; Yip, Paul

    2018-03-01

    Sparse capture-recapture data from open populations are difficult to analyze using currently available frequentist statistical methods. However, in closed capture-recapture experiments, the Chao sparse estimator (Chao, 1989, Biometrics 45, 427-438) may be used to estimate population sizes when there are few recaptures. Here, we extend the Chao (1989) closed population size estimator to the open population setting by using linear regression and extrapolation techniques. We conduct a small simulation study and apply the models to several sparse capture-recapture data sets. © 2017, The International Biometric Society.

  19. Prediction of siRNA potency using sparse logistic regression.

    PubMed

    Hu, Wei; Hu, John

    2014-06-01

    RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.

  20. NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles.

    PubMed

    Tariq, Amara; Karim, Asim; Foroosh, Hassan

    2017-10-01

    Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.

  1. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  2. Evaluation of generalized degrees of freedom for sparse estimation by replica method

    NASA Astrophysics Data System (ADS)

    Sakata, A.

    2016-12-01

    We develop a method to evaluate the generalized degrees of freedom (GDF) for linear regression with sparse regularization. The GDF is a key factor in model selection, and thus its evaluation is useful in many modelling applications. An analytical expression for the GDF is derived using the replica method in the large-system-size limit with random Gaussian predictors. The resulting formula has a universal form that is independent of the type of regularization, providing us with a simple interpretation. Within the framework of replica symmetric (RS) analysis, GDF has a physical meaning as the effective fraction of non-zero components. The validity of our method in the RS phase is supported by the consistency of our results with previous mathematical results. The analytical results in the RS phase are calculated numerically using the belief propagation algorithm.

  3. Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.

    PubMed

    Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong

    2017-06-01

    In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.

  4. Computing group cardinality constraint solutions for logistic regression problems.

    PubMed

    Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M

    2017-01-01

    We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Accelerating cross-validation with total variation and its application to super-resolution imaging

    NASA Astrophysics Data System (ADS)

    Obuchi, Tomoyuki; Ikeda, Shiro; Akiyama, Kazunori; Kabashima, Yoshiyuki

    2017-12-01

    We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by ℓ_1-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision.

  6. Facial Age Synthesis Using Sparse Partial Least Squares (The Case of Ben Needham).

    PubMed

    Bukar, Ali M; Ugail, Hassan

    2017-09-01

    Automatic facial age progression (AFAP) has been an active area of research in recent years. This is due to its numerous applications which include searching for missing. This study presents a new method of AFAP. Here, we use an active appearance model (AAM) to extract facial features from available images. An aging function is then modelled using sparse partial least squares regression (sPLS). Thereafter, the aging function is used to render new faces at different ages. To test the accuracy of our algorithm, extensive evaluation is conducted using a database of 500 face images with known ages. Furthermore, the algorithm is used to progress Ben Needham's facial image that was taken when he was 21 months old to the ages of 6, 14, and 22 years. The algorithm presented in this study could potentially be used to enhance the search for missing people worldwide. © 2017 American Academy of Forensic Sciences.

  7. Structured sparse linear graph embedding.

    PubMed

    Wang, Haixian

    2012-03-01

    Subspace learning is a core issue in pattern recognition and machine learning. Linear graph embedding (LGE) is a general framework for subspace learning. In this paper, we propose a structured sparse extension to LGE (SSLGE) by introducing a structured sparsity-inducing norm into LGE. Specifically, SSLGE casts the projection bases learning into a regression-type optimization problem, and then the structured sparsity regularization is applied to the regression coefficients. The regularization selects a subset of features and meanwhile encodes high-order information reflecting a priori structure information of the data. The SSLGE technique provides a unified framework for discovering structured sparse subspace. Computationally, by using a variational equality and the Procrustes transformation, SSLGE is efficiently solved with closed-form updates. Experimental results on face image show the effectiveness of the proposed method. Copyright © 2011 Elsevier Ltd. All rights reserved.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less

  9. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  10. A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

    PubMed Central

    Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B

    2015-01-01

    Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050

  11. Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

    PubMed Central

    Huang, Jian; Zhang, Cun-Hui

    2013-01-01

    The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100

  12. Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning.

    PubMed

    Gorban, A N; Mirkes, E M; Zinovyev, A

    2016-12-01

    Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0

  13. Sparse gammatone signal model optimized for English speech does not match the human auditory filters.

    PubMed

    Strahl, Stefan; Mertins, Alfred

    2008-07-18

    Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.

  14. Learning distance function for regression-based 4D pulmonary trunk model reconstruction estimated from sparse MRI data

    NASA Astrophysics Data System (ADS)

    Vitanovski, Dime; Tsymbal, Alexey; Ionasec, Razvan; Georgescu, Bogdan; Zhou, Shaohua K.; Hornegger, Joachim; Comaniciu, Dorin

    2011-03-01

    Congenital heart defect (CHD) is the most common birth defect and a frequent cause of death for children. Tetralogy of Fallot (ToF) is the most often occurring CHD which affects in particular the pulmonary valve and trunk. Emerging interventional methods enable percutaneous pulmonary valve implantation, which constitute an alternative to open heart surgery. While minimal invasive methods become common practice, imaging and non-invasive assessment tools become crucial components in the clinical setting. Cardiac computed tomography (CT) and cardiac magnetic resonance imaging (cMRI) are techniques with complementary properties and ability to acquire multiple non-invasive and accurate scans required for advance evaluation and therapy planning. In contrary to CT which covers the full 4D information over the cardiac cycle, cMRI often acquires partial information, for example only one 3D scan of the whole heart in the end-diastolic phase and two 2D planes (long and short axes) over the whole cardiac cycle. The data acquired in this way is called sparse cMRI. In this paper, we propose a regression-based approach for the reconstruction of the full 4D pulmonary trunk model from sparse MRI. The reconstruction approach is based on learning a distance function between the sparse MRI which needs to be completed and the 4D CT data with the full information used as the training set. The distance is based on the intrinsic Random Forest similarity which is learnt for the corresponding regression problem of predicting coordinates of unseen mesh points. Extensive experiments performed on 80 cardiac CT and MR sequences demonstrated the average speed of 10 seconds and accuracy of 0.1053mm mean absolute error for the proposed approach. Using the case retrieval workflow and local nearest neighbour regression with the learnt distance function appears to be competitive with respect to "black box" regression with immediate prediction of coordinates, while providing transparency to the predictions made.

  15. Classification of vegetation types in military region

    NASA Astrophysics Data System (ADS)

    Gonçalves, Miguel; Silva, Jose Silvestre; Bioucas-Dias, Jose

    2015-10-01

    In decision-making process regarding planning and execution of military operations, the terrain is a determining factor. Aerial photographs are a source of vital information for the success of an operation in hostile region, namely when the cartographic information behind enemy lines is scarce or non-existent. The objective of present work is the development of a tool capable of processing aerial photos. The methodology implemented starts with feature extraction, followed by the application of an automatic selector of features. The next step, using the k-fold cross validation technique, estimates the input parameters for the following classifiers: Sparse Multinomial Logist Regression (SMLR), K Nearest Neighbor (KNN), Linear Classifier using Principal Component Expansion on the Joint Data (PCLDC) and Multi-Class Support Vector Machine (MSVM). These classifiers were used in two different studies with distinct objectives: discrimination of vegetation's density and identification of vegetation's main components. It was found that the best classifier on the first approach is the Sparse Logistic Multinomial Regression (SMLR). On the second approach, the implemented methodology applied to high resolution images showed that the better performance was achieved by KNN classifier and PCLDC. Comparing the two approaches there is a multiscale issue, in which for different resolutions, the best solution to the problem requires different classifiers and the extraction of different features.

  16. A new surrogate modeling technique combining Kriging and polynomial chaos expansions – Application to uncertainty analysis in computational dosimetry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kersaudy, Pierric, E-mail: pierric.kersaudy@orange.com; Whist Lab, 38 avenue du Général Leclerc, 92130 Issy-les-Moulineaux; ESYCOM, Université Paris-Est Marne-la-Vallée, 5 boulevard Descartes, 77700 Marne-la-Vallée

    2015-04-01

    In numerical dosimetry, the recent advances in high performance computing led to a strong reduction of the required computational time to assess the specific absorption rate (SAR) characterizing the human exposure to electromagnetic waves. However, this procedure remains time-consuming and a single simulation can request several hours. As a consequence, the influence of uncertain input parameters on the SAR cannot be analyzed using crude Monte Carlo simulation. The solution presented here to perform such an analysis is surrogate modeling. This paper proposes a novel approach to build such a surrogate model from a design of experiments. Considering a sparse representationmore » of the polynomial chaos expansions using least-angle regression as a selection algorithm to retain the most influential polynomials, this paper proposes to use the selected polynomials as regression functions for the universal Kriging model. The leave-one-out cross validation is used to select the optimal number of polynomials in the deterministic part of the Kriging model. The proposed approach, called LARS-Kriging-PC modeling, is applied to three benchmark examples and then to a full-scale metamodeling problem involving the exposure of a numerical fetus model to a femtocell device. The performances of the LARS-Kriging-PC are compared to an ordinary Kriging model and to a classical sparse polynomial chaos expansion. The LARS-Kriging-PC appears to have better performances than the two other approaches. A significant accuracy improvement is observed compared to the ordinary Kriging or to the sparse polynomial chaos depending on the studied case. This approach seems to be an optimal solution between the two other classical approaches. A global sensitivity analysis is finally performed on the LARS-Kriging-PC model of the fetus exposure problem.« less

  17. Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression

    NASA Astrophysics Data System (ADS)

    Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang

    2018-02-01

    Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.

  18. Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression

    NASA Astrophysics Data System (ADS)

    Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang

    2018-05-01

    Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.

  19. Sparse kernel methods for high-dimensional survival data.

    PubMed

    Evers, Ludger; Messow, Claudia-Martina

    2008-07-15

    Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.

  20. Structured functional additive regression in reproducing kernel Hilbert spaces.

    PubMed

    Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen

    2014-06-01

    Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.

  1. Sparse Reconstruction Techniques in MRI: Methods, Applications, and Challenges to Clinical Adoption

    PubMed Central

    Yang, Alice Chieh-Yu; Kretzler, Madison; Sudarski, Sonja; Gulani, Vikas; Seiberlich, Nicole

    2016-01-01

    The family of sparse reconstruction techniques, including the recently introduced compressed sensing framework, has been extensively explored to reduce scan times in Magnetic Resonance Imaging (MRI). While there are many different methods that fall under the general umbrella of sparse reconstructions, they all rely on the idea that a priori information about the sparsity of MR images can be employed to reconstruct full images from undersampled data. This review describes the basic ideas behind sparse reconstruction techniques, how they could be applied to improve MR imaging, and the open challenges to their general adoption in a clinical setting. The fundamental principles underlying different classes of sparse reconstructions techniques are examined, and the requirements that each make on the undersampled data outlined. Applications that could potentially benefit from the accelerations that sparse reconstructions could provide are described, and clinical studies using sparse reconstructions reviewed. Lastly, technical and clinical challenges to widespread implementation of sparse reconstruction techniques, including optimization, reconstruction times, artifact appearance, and comparison with current gold-standards, are discussed. PMID:27003227

  2. Application of L1/2 regularization logistic method in heart disease diagnosis.

    PubMed

    Zhang, Bowen; Chai, Hua; Yang, Ziyi; Liang, Yong; Chu, Gejin; Liu, Xiaoying

    2014-01-01

    Heart disease has become the number one killer of human health, and its diagnosis depends on many features, such as age, blood pressure, heart rate and other dozens of physiological indicators. Although there are so many risk factors, doctors usually diagnose the disease depending on their intuition and experience, which requires a lot of knowledge and experience for correct determination. To find the hidden medical information in the existing clinical data is a noticeable and powerful approach in the study of heart disease diagnosis. In this paper, sparse logistic regression method is introduced to detect the key risk factors using L(1/2) regularization on the real heart disease data. Experimental results show that the sparse logistic L(1/2) regularization method achieves fewer but informative key features than Lasso, SCAD, MCP and Elastic net regularization approaches. Simultaneously, the proposed method can cut down the computational complexity, save cost and time to undergo medical tests and checkups, reduce the number of attributes needed to be taken from patients.

  3. Improving the Performance of Temperature Index Snowmelt Model of SWAT by Using MODIS Land Surface Temperature Data

    PubMed Central

    Yang, Yan; Onishi, Takeo; Hiramatsu, Ken

    2014-01-01

    Simulation results of the widely used temperature index snowmelt model are greatly influenced by input air temperature data. Spatially sparse air temperature data remain the main factor inducing uncertainties and errors in that model, which limits its applications. Thus, to solve this problem, we created new air temperature data using linear regression relationships that can be formulated based on MODIS land surface temperature data. The Soil Water Assessment Tool model, which includes an improved temperature index snowmelt module, was chosen to test the newly created data. By evaluating simulation performance for daily snowmelt in three test basins of the Amur River, performance of the newly created data was assessed. The coefficient of determination (R 2) and Nash-Sutcliffe efficiency (NSE) were used for evaluation. The results indicate that MODIS land surface temperature data can be used as a new source for air temperature data creation. This will improve snow simulation using the temperature index model in an area with sparse air temperature observations. PMID:25165746

  4. A Spectral Reconstruction Algorithm of Miniature Spectrometer Based on Sparse Optimization and Dictionary Learning.

    PubMed

    Zhang, Shang; Dong, Yuhan; Fu, Hongyan; Huang, Shao-Lun; Zhang, Lin

    2018-02-22

    The miniaturization of spectrometer can broaden the application area of spectrometry, which has huge academic and industrial value. Among various miniaturization approaches, filter-based miniaturization is a promising implementation by utilizing broadband filters with distinct transmission functions. Mathematically, filter-based spectral reconstruction can be modeled as solving a system of linear equations. In this paper, we propose an algorithm of spectral reconstruction based on sparse optimization and dictionary learning. To verify the feasibility of the reconstruction algorithm, we design and implement a simple prototype of a filter-based miniature spectrometer. The experimental results demonstrate that sparse optimization is well applicable to spectral reconstruction whether the spectra are directly sparse or not. As for the non-directly sparse spectra, their sparsity can be enhanced by dictionary learning. In conclusion, the proposed approach has a bright application prospect in fabricating a practical miniature spectrometer.

  5. A Spectral Reconstruction Algorithm of Miniature Spectrometer Based on Sparse Optimization and Dictionary Learning

    PubMed Central

    Zhang, Shang; Fu, Hongyan; Huang, Shao-Lun; Zhang, Lin

    2018-01-01

    The miniaturization of spectrometer can broaden the application area of spectrometry, which has huge academic and industrial value. Among various miniaturization approaches, filter-based miniaturization is a promising implementation by utilizing broadband filters with distinct transmission functions. Mathematically, filter-based spectral reconstruction can be modeled as solving a system of linear equations. In this paper, we propose an algorithm of spectral reconstruction based on sparse optimization and dictionary learning. To verify the feasibility of the reconstruction algorithm, we design and implement a simple prototype of a filter-based miniature spectrometer. The experimental results demonstrate that sparse optimization is well applicable to spectral reconstruction whether the spectra are directly sparse or not. As for the non-directly sparse spectra, their sparsity can be enhanced by dictionary learning. In conclusion, the proposed approach has a bright application prospect in fabricating a practical miniature spectrometer. PMID:29470406

  6. Ontology Sparse Vector Learning Algorithm for Ontology Similarity Measuring and Ontology Mapping via ADAL Technology

    NASA Astrophysics Data System (ADS)

    Gao, Wei; Zhu, Linli; Wang, Kaiyun

    2015-12-01

    Ontology, a model of knowledge representation and storage, has had extensive applications in pharmaceutics, social science, chemistry and biology. In the age of “big data”, the constructed concepts are often represented as higher-dimensional data by scholars, and thus the sparse learning techniques are introduced into ontology algorithms. In this paper, based on the alternating direction augmented Lagrangian method, we present an ontology optimization algorithm for ontological sparse vector learning, and a fast version of such ontology technologies. The optimal sparse vector is obtained by an iterative procedure, and the ontology function is then obtained from the sparse vector. Four simulation experiments show that our ontological sparse vector learning model has a higher precision ratio on plant ontology, humanoid robotics ontology, biology ontology and physics education ontology data for similarity measuring and ontology mapping applications.

  7. Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.

    PubMed

    Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan

    2011-11-01

    When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.

  8. Parametric Human Body Reconstruction Based on Sparse Key Points.

    PubMed

    Cheng, Ke-Li; Tong, Ruo-Feng; Tang, Min; Qian, Jing-Ye; Sarkis, Michel

    2016-11-01

    We propose an automatic parametric human body reconstruction algorithm which can efficiently construct a model using a single Kinect sensor. A user needs to stand still in front of the sensor for a couple of seconds to measure the range data. The user's body shape and pose will then be automatically constructed in several seconds. Traditional methods optimize dense correspondences between range data and meshes. In contrast, our proposed scheme relies on sparse key points for the reconstruction. It employs regression to find the corresponding key points between the scanned range data and some annotated training data. We design two kinds of feature descriptors as well as corresponding regression stages to make the regression robust and accurate. Our scheme follows with dense refinement where a pre-factorization method is applied to improve the computational efficiency. Compared with other methods, our scheme achieves similar reconstruction accuracy but significantly reduces runtime.

  9. Structured functional additive regression in reproducing kernel Hilbert spaces

    PubMed Central

    Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen

    2013-01-01

    Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362

  10. Functional interaction-based nonlinear models with application to multiplatform genomics data.

    PubMed

    Davenport, Clemontina A; Maity, Arnab; Baladandayuthapani, Veerabhadran

    2018-05-07

    Functional regression allows for a scalar response to be dependent on a functional predictor; however, not much work has been done when a scalar exposure that interacts with the functional covariate is introduced. In this paper, we present 2 functional regression models that account for this interaction and propose 2 novel estimation procedures for the parameters in these models. These estimation methods allow for a noisy and/or sparsely observed functional covariate and are easily extended to generalized exponential family responses. We compute standard errors of our estimators, which allows for further statistical inference and hypothesis testing. We compare the performance of the proposed estimators to each other and to one found in the literature via simulation and demonstrate our methods using a real data example. Copyright © 2018 John Wiley & Sons, Ltd.

  11. Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint.

    PubMed

    Gao, Zhi; Lao, Mingjie; Sang, Yongsheng; Wen, Fei; Ramesh, Bharath; Zhai, Ruifang

    2018-05-06

    Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency.

  12. Multi-frame linear regressive filter for the measurement of infrared pixel spatial response and MTF from sparse data.

    PubMed

    Huard, Edouard; Derelle, Sophie; Jaeck, Julien; Nghiem, Jean; Haïdar, Riad; Primot, Jérôme

    2018-03-05

    A challenging point in the prediction of the image quality of infrared imaging systems is the evaluation of the detector modulation transfer function (MTF). In this paper, we present a linear method to get a 2D continuous MTF from sparse spectral data. Within the method, an object with a predictable sparse spatial spectrum is imaged by the focal plane array. The sparse data is then treated to return the 2D continuous MTF with the hypothesis that all the pixels have an identical spatial response. The linearity of the treatment is a key point to estimate directly the error bars of the resulting detector MTF. The test bench will be presented along with measurement tests on a 25 μm pitch InGaAs detector.

  13. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2016-02-24

    Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers' convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.

  14. Separation in Logistic Regression: Causes, Consequences, and Control.

    PubMed

    Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg

    2018-04-01

    Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.

  15. Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.

    PubMed

    Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo

    2015-08-01

    Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

  16. Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint

    PubMed Central

    Lao, Mingjie; Sang, Yongsheng; Wen, Fei; Zhai, Ruifang

    2018-01-01

    Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency. PMID:29734793

  17. Bayesian Scalar-on-Image Regression with Application to Association Between Intracranial DTI and Cognitive Outcomes

    PubMed Central

    Huang, Lei; Goldsmith, Jeff; Reiss, Philip T.; Reich, Daniel S.; Crainiceanu, Ciprian M.

    2013-01-01

    Diffusion tensor imaging (DTI) measures water diffusion within white matter, allowing for in vivo quantification of brain pathways. These pathways often subserve specific functions, and impairment of those functions is often associated with imaging abnormalities. As a method for predicting clinical disability from DTI images, we propose a hierarchical Bayesian “scalar-on-image” regression procedure. Our procedure introduces a latent binary map that estimates the locations of predictive voxels and penalizes the magnitude of effect sizes in these voxels, thereby resolving the ill-posed nature of the problem. By inducing a spatial prior structure, the procedure yields a sparse association map that also maintains spatial continuity of predictive regions. The method is demonstrated on a simulation study and on a study of association between fractional anisotropy and cognitive disability in a cross-sectional sample of 135 multiple sclerosis patients. PMID:23792220

  18. Potential application of machine learning in health outcomes research and some statistical cautions.

    PubMed

    Crown, William H

    2015-03-01

    Traditional analytic methods are often ill-suited to the evolving world of health care big data characterized by massive volume, complexity, and velocity. In particular, methods are needed that can estimate models efficiently using very large datasets containing healthcare utilization data, clinical data, data from personal devices, and many other sources. Although very large, such datasets can also be quite sparse (e.g., device data may only be available for a small subset of individuals), which creates problems for traditional regression models. Many machine learning methods address such limitations effectively but are still subject to the usual sources of bias that commonly arise in observational studies. Researchers using machine learning methods such as lasso or ridge regression should assess these models using conventional specification tests. Copyright © 2015 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  19. Statistics of single unit responses in the human medial temporal lobe: A sparse and overdispersed code

    NASA Astrophysics Data System (ADS)

    Magyar, Andrew

    The recent discovery of cells that respond to purely conceptual features of the environment (particular people, landmarks, objects, etc) in the human medial temporal lobe (MTL), has raised many questions about the nature of the neural code in humans. The goal of this dissertation is to develop a novel statistical method based upon maximum likelihood regression which will then be applied to these experiments in order to produce a quantitative description of the coding properties of the human MTL. In general, the method is applicable to any experiments in which a sequence of stimuli are presented to an organism while the binary responses of a large number of cells are recorded in parallel. The central concept underlying the approach is the total probability that a neuron responds to a random stimulus, called the neuronal sparsity. The model then estimates the distribution of response probabilities across the population of cells. Applying the method to single-unit recordings from the human medial temporal lobe, estimates of the sparsity distributions are acquired in four regions: the hippocampus, the entorhinal cortex, the amygdala, and the parahippocampal cortex. The resulting distributions are found to be sparse (large fraction of cells with a low response probability) and highly non-uniform, with a large proportion of ultra-sparse neurons that possess a very low response probability, and a smaller population of cells which respond much more frequently. Rammifications of the results are discussed in relation to the sparse coding hypothesis, and comparisons are made between the statistics of the human medial temporal lobe cells and place cells observed in the rodent hippocampus.

  20. Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context

    PubMed Central

    Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan

    2012-01-01

    When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso. PMID:22347720

  1. Magnetic Resonance Super-resolution Imaging Measurement with Dictionary-optimized Sparse Learning

    NASA Astrophysics Data System (ADS)

    Li, Jun-Bao; Liu, Jing; Pan, Jeng-Shyang; Yao, Hongxun

    2017-06-01

    Magnetic Resonance Super-resolution Imaging Measurement (MRIM) is an effective way of measuring materials. MRIM has wide applications in physics, chemistry, biology, geology, medical and material science, especially in medical diagnosis. It is feasible to improve the resolution of MR imaging through increasing radiation intensity, but the high radiation intensity and the longtime of magnetic field harm the human body. Thus, in the practical applications the resolution of hardware imaging reaches the limitation of resolution. Software-based super-resolution technology is effective to improve the resolution of image. This work proposes a framework of dictionary-optimized sparse learning based MR super-resolution method. The framework is to solve the problem of sample selection for dictionary learning of sparse reconstruction. The textural complexity-based image quality representation is proposed to choose the optimal samples for dictionary learning. Comprehensive experiments show that the dictionary-optimized sparse learning improves the performance of sparse representation.

  2. Reconstruction of spatio-temporal temperature from sparse historical records using robust probabilistic principal component regression

    USGS Publications Warehouse

    Tipton, John; Hooten, Mevin B.; Goring, Simon

    2017-01-01

    Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.

  3. Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics

    NASA Technical Reports Server (NTRS)

    Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.

    2001-01-01

    An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.

  4. Sparse Regression Based Structure Learning of Stochastic Reaction Networks from Single Cell Snapshot Time Series.

    PubMed

    Klimovskaia, Anna; Ganscha, Stefan; Claassen, Manfred

    2016-12-01

    Stochastic chemical reaction networks constitute a model class to quantitatively describe dynamics and cell-to-cell variability in biological systems. The topology of these networks typically is only partially characterized due to experimental limitations. Current approaches for refining network topology are based on the explicit enumeration of alternative topologies and are therefore restricted to small problem instances with almost complete knowledge. We propose the reactionet lasso, a computational procedure that derives a stepwise sparse regression approach on the basis of the Chemical Master Equation, enabling large-scale structure learning for reaction networks by implicitly accounting for billions of topology variants. We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the complete TRAIL induced apoptosis signaling cascade comprising 70 reactions. We find that the reactionet lasso is able to efficiently recover the structure of these reaction systems, ab initio, with high sensitivity and specificity. With only < 1% false discoveries, the reactionet lasso is able to recover 45% of all true reactions ab initio among > 6000 possible reactions and over 102000 network topologies. In conjunction with information rich single cell technologies such as single cell RNA sequencing or mass cytometry, the reactionet lasso will enable large-scale structure learning, particularly in areas with partial network structure knowledge, such as cancer biology, and thereby enable the detection of pathological alterations of reaction networks. We provide software to allow for wide applicability of the reactionet lasso.

  5. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    PubMed

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  6. Efficient robust doubly adaptive regularized regression with applications.

    PubMed

    Karunamuni, Rohana J; Kong, Linglong; Tu, Wei

    2018-01-01

    We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.

  7. Simultaneous multiple non-crossing quantile regression estimation using kernel constraints

    PubMed Central

    Liu, Yufeng; Wu, Yichao

    2011-01-01

    Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842

  8. Mapping soil textural fractions across a large watershed in north-east Florida.

    PubMed

    Lamsal, S; Mishra, U

    2010-08-01

    Assessment of regional scale soil spatial variation and mapping their distribution is constrained by sparse data which are collected using field surveys that are labor intensive and cost prohibitive. We explored geostatistical (ordinary kriging-OK), regression (Regression Tree-RT), and hybrid methods (RT plus residual Sequential Gaussian Simulation-SGS) to map soil textural fractions across the Santa Fe River Watershed (3585 km(2)) in north-east Florida. Soil samples collected from four depths (L1: 0-30 cm, L2: 30-60 cm, L3: 60-120 cm, and L4: 120-180 cm) at 141 locations were analyzed for soil textural fractions (sand, silt and clay contents), and combined with textural data (15 profiles) assembled under the Florida Soil Characterization program. Textural fractions in L1 and L2 were autocorrelated, and spatially mapped across the watershed. OK performance was poor, which may be attributed to the sparse sampling. RT model structure varied among textural fractions, and the model explained variations ranged from 25% for L1 silt to 61% for L2 clay content. Regression residuals were simulated using SGS, and the average of simulated residuals were used to approximate regression residual distribution map, which were added to regression trend maps. Independent validation of the prediction maps showed that regression models performed slightly better than OK, and regression combined with average of simulated regression residuals improved predictions beyond the regression model. Sand content >90% in both 0-30 and 30-60 cm covered 80.6% of the watershed area. Copyright 2010 Elsevier Ltd. All rights reserved.

  9. Predictive sparse modeling of fMRI data for improved classification, regression, and visualization using the k-support norm.

    PubMed

    Belilovsky, Eugene; Gkirtzou, Katerina; Misyrlis, Michail; Konova, Anna B; Honorio, Jean; Alia-Klein, Nelly; Goldstein, Rita Z; Samaras, Dimitris; Blaschko, Matthew B

    2015-12-01

    We explore various sparse regularization techniques for analyzing fMRI data, such as the ℓ1 norm (often called LASSO in the context of a squared loss function), elastic net, and the recently introduced k-support norm. Employing sparsity regularization allows us to handle the curse of dimensionality, a problem commonly found in fMRI analysis. In this work we consider sparse regularization in both the regression and classification settings. We perform experiments on fMRI scans from cocaine-addicted as well as healthy control subjects. We show that in many cases, use of the k-support norm leads to better predictive performance, solution stability, and interpretability as compared to other standard approaches. We additionally analyze the advantages of using the absolute loss function versus the standard squared loss which leads to significantly better predictive performance for the regularization methods tested in almost all cases. Our results support the use of the k-support norm for fMRI analysis and on the clinical side, the generalizability of the I-RISA model of cocaine addiction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner.

    PubMed

    Wang, Yubo; Veluvolu, Kalyana C

    2017-06-14

    It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  11. Application of Multi-task Sparse Lasso Feature Extraction and Support Vector Machine Regression in the Stellar Atmospheric Parameterization

    NASA Astrophysics Data System (ADS)

    Gao, Wei; Li, Xiang-ru

    2017-07-01

    The multi-task learning takes the multiple tasks together to make analysis and calculation, so as to dig out the correlations among them, and therefore to improve the accuracy of the analyzed results. This kind of methods have been widely applied to the machine learning, pattern recognition, computer vision, and other related fields. This paper investigates the application of multi-task learning in estimating the stellar atmospheric parameters, including the surface temperature (Teff), surface gravitational acceleration (lg g), and chemical abundance ([Fe/H]). Firstly, the spectral features of the three stellar atmospheric parameters are extracted by using the multi-task sparse group Lasso algorithm, then the support vector machine is used to estimate the atmospheric physical parameters. The proposed scheme is evaluated on both the Sloan stellar spectra and the theoretical spectra computed from the Kurucz's New Opacity Distribution Function (NEWODF) model. The mean absolute errors (MAEs) on the Sloan spectra are: 0.0064 for lg (Teff /K), 0.1622 for lg (g/(cm · s-2)), and 0.1221 dex for [Fe/H]; the MAEs on the synthetic spectra are 0.0006 for lg (Teff /K), 0.0098 for lg (g/(cm · s-2)), and 0.0082 dex for [Fe/H]. Experimental results show that the proposed scheme has a rather high accuracy for the estimation of stellar atmospheric parameters.

  12. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Sparse High Dimensional Models in Economics

    PubMed Central

    Fan, Jianqing; Lv, Jinchi; Qi, Lei

    2010-01-01

    This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635

  14. Feature Clustering for Accelerating Parallel Coordinate Descent

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  15. Cross-domain expression recognition based on sparse coding and transfer learning

    NASA Astrophysics Data System (ADS)

    Yang, Yong; Zhang, Weiyi; Huang, Yong

    2017-05-01

    Traditional facial expression recognition methods usually assume that the training set and the test set are independent and identically distributed. However, in actual expression recognition applications, the conditions of independent and identical distribution are hardly satisfied for the training set and test set because of the difference of light, shade, race and so on. In order to solve this problem and improve the performance of expression recognition in the actual applications, a novel method based on transfer learning and sparse coding is applied to facial expression recognition. First of all, a common primitive model, that is, the dictionary is learnt. Then, based on the idea of transfer learning, the learned primitive pattern is transferred to facial expression and the corresponding feature representation is obtained by sparse coding. The experimental results in CK +, JAFFE and NVIE database shows that the transfer learning based on sparse coding method can effectively improve the expression recognition rate in the cross-domain expression recognition task and is suitable for the practical facial expression recognition applications.

  16. An information theoretic approach of designing sparse kernel adaptive filters.

    PubMed

    Liu, Weifeng; Park, Il; Principe, José C

    2009-12-01

    This paper discusses an information theoretic approach of designing sparse kernel adaptive filters. To determine useful data to be learned and remove redundant ones, a subjective information measure called surprise is introduced. Surprise captures the amount of information a datum contains which is transferable to a learning system. Based on this concept, we propose a systematic sparsification scheme, which can drastically reduce the time and space complexity without harming the performance of kernel adaptive filters. Nonlinear regression, short term chaotic time-series prediction, and long term time-series forecasting examples are presented.

  17. Encrypted data stream identification using randomness sparse representation and fuzzy Gaussian mixture model

    NASA Astrophysics Data System (ADS)

    Zhang, Hong; Hou, Rui; Yi, Lei; Meng, Juan; Pan, Zhisong; Zhou, Yuhuan

    2016-07-01

    The accurate identification of encrypted data stream helps to regulate illegal data, detect network attacks and protect users' information. In this paper, a novel encrypted data stream identification algorithm is introduced. The proposed method is based on randomness characteristics of encrypted data stream. We use a l1-norm regularized logistic regression to improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model (FGMM) to improve identification accuracy. Experimental results demonstrate that the method can be adopted as an effective technique for encrypted data stream identification.

  18. Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

    PubMed Central

    Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

    2014-01-01

    Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792

  19. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  20. Robust extraction of basis functions for simultaneous and proportional myoelectric control via sparse non-negative matrix factorization

    NASA Astrophysics Data System (ADS)

    Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario

    2018-04-01

    Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.

  1. Signal processing using sparse derivatives with applications to chromatograms and ECG

    NASA Astrophysics Data System (ADS)

    Ning, Xiaoran

    In this thesis, we investigate the sparsity exist in the derivative domain. Particularly, we focus on the type of signals which posses up to Mth (M > 0) order sparse derivatives. Efforts are put on formulating proper penalty functions and optimization problems to capture properties related to sparse derivatives, searching for fast, computationally efficient solvers. Also the effectiveness of these algorithms are applied to two real world applications. In the first application, we provide an algorithm which jointly addresses the problems of chromatogram baseline correction and noise reduction. The series of chromatogram peaks are modeled as sparse with sparse derivatives, and the baseline is modeled as a low-pass signal. A convex optimization problem is formulated so as to encapsulate these non-parametric models. To account for the positivity of chromatogram peaks, an asymmetric penalty function is also utilized with symmetric penalty functions. A robust, computationally efficient, iterative algorithm is developed that is guaranteed to converge to the unique optimal solution. The approach, termed Baseline Estimation And Denoising with Sparsity (BEADS), is evaluated and compared with two state-of-the-art methods using both simulated and real chromatogram data. Promising result is obtained. In the second application, a novel Electrocardiography (ECG) enhancement algorithm is designed also based on sparse derivatives. In the real medical environment, ECG signals are often contaminated by various kinds of noise or artifacts, for example, morphological changes due to motion artifact, non-stationary noise due to muscular contraction (EMG), etc. Some of these contaminations severely affect the usefulness of ECG signals, especially when computer aided algorithms are utilized. By solving the proposed convex l1 optimization problem, artifacts are reduced by modeling the clean ECG signal as a sum of two signals whose second and third-order derivatives (differences) are sparse respectively. At the end, the algorithm is applied to a QRS detection system and validated using the MIT-BIH Arrhythmia database (109452 anotations), resulting a sensitivity of Se = 99.87%$ and a positive prediction of +P = 99.88%.

  2. L2-norm multiple kernel learning and its application to biomedical data fusion

    PubMed Central

    2010-01-01

    Background This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L∞, L1, and L2 MKL. In particular, L2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L∞ MKL method. In real biomedical applications, L2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results We provide a theoretical analysis of the relationship between the L2 optimization of kernels in the dual problem with the L2 coefficient regularization in the primal problem. Understanding the dual L2 problem grants a unified view on MKL and enables us to extend the L2 method to a wide range of machine learning problems. We implement L2 MKL for ranking and classification problems and compare its performance with the sparse L∞ and the averaging L1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. L2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel L2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing. Conclusions This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in L∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing L2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL. Availability The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html. PMID:20529363

  3. NITPICK: peak identification for mass spectrometry data

    PubMed Central

    Renard, Bernhard Y; Kirchner, Marc; Steen , Hanno; Steen, Judith AJ; Hamprecht , Fred A

    2008-01-01

    Background The reliable extraction of features from mass spectra is a fundamental step in the automated analysis of proteomic mass spectrometry (MS) experiments. Results This contribution proposes a sparse template regression approach to peak picking called NITPICK. NITPICK is a Non-greedy, Iterative Template-based peak PICKer that deconvolves complex overlapping isotope distributions in multicomponent mass spectra. NITPICK is based on fractional averagine, a novel extension to Senko's well-known averagine model, and on a modified version of sparse, non-negative least angle regression, for which a suitable, statistically motivated early stopping criterion has been derived. The strength of NITPICK is the deconvolution of overlapping mixture mass spectra. Conclusion Extensive comparative evaluation has been carried out and results are provided for simulated and real-world data sets. NITPICK outperforms pepex, to date the only alternate, publicly available, non-greedy feature extraction routine. NITPICK is available as software package for the R programming language and can be downloaded from . PMID:18755032

  4. FPGA implementation of sparse matrix algorithm for information retrieval

    NASA Astrophysics Data System (ADS)

    Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio

    2005-06-01

    Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.

  5. Cosinor-based rhythmometry

    PubMed Central

    2014-01-01

    A brief overview is provided of cosinor-based techniques for the analysis of time series in chronobiology. Conceived as a regression problem, the method is applicable to non-equidistant data, a major advantage. Another dividend is the feasibility of deriving confidence intervals for parameters of rhythmic components of known periods, readily drawn from the least squares procedure, stressing the importance of prior (external) information. Originally developed for the analysis of short and sparse data series, the extended cosinor has been further developed for the analysis of long time series, focusing both on rhythm detection and parameter estimation. Attention is given to the assumptions underlying the use of the cosinor and ways to determine whether they are satisfied. In particular, ways of dealing with non-stationary data are presented. Examples illustrate the use of the different cosinor-based methods, extending their application from the study of circadian rhythms to the mapping of broad time structures (chronomes). PMID:24725531

  6. Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning.

    PubMed

    Yang, Yimin; Wu, Q M Jonathan

    2016-11-01

    The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.

  7. Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication

    DOE PAGES

    Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...

    2015-01-01

    The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less

  8. A novel structured dictionary for fast processing of 3D medical images, with application to computed tomography restoration and denoising

    NASA Astrophysics Data System (ADS)

    Karimi, Davood; Ward, Rabab K.

    2016-03-01

    Sparse representation of signals in learned overcomplete dictionaries has proven to be a powerful tool with applications in denoising, restoration, compression, reconstruction, and more. Recent research has shown that learned overcomplete dictionaries can lead to better results than analytical dictionaries such as wavelets in almost all image processing applications. However, a major disadvantage of these dictionaries is that their learning and usage is very computationally intensive. In particular, finding the sparse representation of a signal in these dictionaries requires solving an optimization problem that leads to very long computational times, especially in 3D image processing. Moreover, the sparse representation found by greedy algorithms is usually sub-optimal. In this paper, we propose a novel two-level dictionary structure that improves the performance and the speed of standard greedy sparse coding methods. The first (i.e., the top) level in our dictionary is a fixed orthonormal basis, whereas the second level includes the atoms that are learned from the training data. We explain how such a dictionary can be learned from the training data and how the sparse representation of a new signal in this dictionary can be computed. As an application, we use the proposed dictionary structure for removing the noise and artifacts in 3D computed tomography (CT) images. Our experiments with real CT images show that the proposed method achieves results that are comparable with standard dictionary-based methods while substantially reducing the computational time.

  9. Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions.

    PubMed

    Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu

    2017-11-01

    This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S 3 ONC admits FPTAS.

  10. Sparse learning of stochastic dynamical equations

    NASA Astrophysics Data System (ADS)

    Boninsegna, Lorenzo; Nüske, Feliks; Clementi, Cecilia

    2018-06-01

    With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.

  11. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders.

    PubMed

    Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean

    2016-08-10

    Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.

  12. Hierarchical Bayesian sparse image reconstruction with application to MRFM.

    PubMed

    Dobigeon, Nicolas; Hero, Alfred O; Tourneret, Jean-Yves

    2009-09-01

    This paper presents a hierarchical Bayesian model to reconstruct sparse images when the observations are obtained from linear transformations and corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be used to estimate the image to be recovered, e.g., by maximizing the estimated posterior distribution. In our fully Bayesian approach, the posteriors of all the parameters are available. Thus, our algorithm provides more information than other previously proposed sparse reconstruction methods that only give a point estimate. The performance of the proposed hierarchical Bayesian sparse reconstruction method is illustrated on synthetic data and real data collected from a tobacco virus sample using a prototype MRFM instrument.

  13. Massively parallel sparse matrix function calculations with NTPoly

    NASA Astrophysics Data System (ADS)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  14. Discussion of CoSA: Clustering of Sparse Approximations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Armstrong, Derek Elswick

    2017-03-07

    The purpose of this talk is to discuss the possible applications of CoSA (Clustering of Sparse Approximations) to the exploitation of HSI (HyperSpectral Imagery) data. CoSA is presented by Moody et al. in the Journal of Applied Remote Sensing (“Land cover classification in multispectral imagery using clustering of sparse approximations over learned feature dictionaries”, Vol. 8, 2014) and is based on machine learning techniques.

  15. Solution of matrix equations using sparse techniques

    NASA Technical Reports Server (NTRS)

    Baddourah, Majdi

    1994-01-01

    The solution of large systems of matrix equations is key to the solution of a large number of scientific and engineering problems. This talk describes the sparse matrix solver developed at Langley which can routinely solve in excess of 263,000 equations in 40 seconds on one Cray C-90 processor. It appears that for large scale structural analysis applications, sparse matrix methods have a significant performance advantage over other methods.

  16. Total variation-based method for radar coincidence imaging with model mismatch for extended target

    NASA Astrophysics Data System (ADS)

    Cao, Kaicheng; Zhou, Xiaoli; Cheng, Yongqiang; Fan, Bo; Qin, Yuliang

    2017-11-01

    Originating from traditional optical coincidence imaging, radar coincidence imaging (RCI) is a staring/forward-looking imaging technique. In RCI, the reference matrix must be computed precisely to reconstruct the image as preferred; unfortunately, such precision is almost impossible due to the existence of model mismatch in practical applications. Although some conventional sparse recovery algorithms are proposed to solve the model-mismatch problem, they are inapplicable to nonsparse targets. We therefore sought to derive the signal model of RCI with model mismatch by replacing the sparsity constraint item with total variation (TV) regularization in the sparse total least squares optimization problem; in this manner, we obtain the objective function of RCI with model mismatch for an extended target. A more robust and efficient algorithm called TV-TLS is proposed, in which the objective function is divided into two parts and the perturbation matrix and scattering coefficients are updated alternately. Moreover, due to the ability of TV regularization to recover sparse signal or image with sparse gradient, TV-TLS method is also applicable to sparse recovering. Results of numerical experiments demonstrate that, for uniform extended targets, sparse targets, and real extended targets, the algorithm can achieve preferred imaging performance both in suppressing noise and in adapting to model mismatch.

  17. A generative model of whole-brain effective connectivity.

    PubMed

    Frässle, Stefan; Lomakina, Ekaterina I; Kasper, Lars; Manjaly, Zina M; Leff, Alex; Pruessmann, Klaas P; Buhmann, Joachim M; Stephan, Klaas E

    2018-05-25

    The development of whole-brain models that can infer effective (directed) connection strengths from fMRI data represents a central challenge for computational neuroimaging. A recently introduced generative model of fMRI data, regression dynamic causal modeling (rDCM), moves towards this goal as it scales gracefully to very large networks. However, large-scale networks with thousands of connections are difficult to interpret; additionally, one typically lacks information (data points per free parameter) for precise estimation of all model parameters. This paper introduces sparsity constraints to the variational Bayesian framework of rDCM as a solution to these problems in the domain of task-based fMRI. This sparse rDCM approach enables highly efficient effective connectivity analyses in whole-brain networks and does not require a priori assumptions about the network's connectivity structure but prunes fully (all-to-all) connected networks as part of model inversion. Following the derivation of the variational Bayesian update equations for sparse rDCM, we use both simulated and empirical data to assess the face validity of the model. In particular, we show that it is feasible to infer effective connection strengths from fMRI data using a network with more than 100 regions and 10,000 connections. This demonstrates the feasibility of whole-brain inference on effective connectivity from fMRI data - in single subjects and with a run-time below 1 min when using parallelized code. We anticipate that sparse rDCM may find useful application in connectomics and clinical neuromodeling - for example, for phenotyping individual patients in terms of whole-brain network structure. Copyright © 2018. Published by Elsevier Inc.

  18. Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

    NASA Astrophysics Data System (ADS)

    Stoykov, S.; Atanassov, E.; Margenov, S.

    2016-10-01

    Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.

  19. Variable is better than invariable: sparse VSS-NLMS algorithms with application to adaptive MIMO channel estimation.

    PubMed

    Gui, Guan; Chen, Zhang-xin; Xu, Li; Wan, Qun; Huang, Jiyan; Adachi, Fumiyuki

    2014-01-01

    Channel estimation problem is one of the key technical issues in sparse frequency-selective fading multiple-input multiple-output (MIMO) communication systems using orthogonal frequency division multiplexing (OFDM) scheme. To estimate sparse MIMO channels, sparse invariable step-size normalized least mean square (ISS-NLMS) algorithms were applied to adaptive sparse channel estimation (ACSE). It is well known that step-size is a critical parameter which controls three aspects: algorithm stability, estimation performance, and computational cost. However, traditional methods are vulnerable to cause estimation performance loss because ISS cannot balance the three aspects simultaneously. In this paper, we propose two stable sparse variable step-size NLMS (VSS-NLMS) algorithms to improve the accuracy of MIMO channel estimators. First, ASCE is formulated in MIMO-OFDM systems. Second, different sparse penalties are introduced to VSS-NLMS algorithm for ASCE. In addition, difference between sparse ISS-NLMS algorithms and sparse VSS-NLMS ones is explained and their lower bounds are also derived. At last, to verify the effectiveness of the proposed algorithms for ASCE, several selected simulation results are shown to prove that the proposed sparse VSS-NLMS algorithms can achieve better estimation performance than the conventional methods via mean square error (MSE) and bit error rate (BER) metrics.

  20. Variable Is Better Than Invariable: Sparse VSS-NLMS Algorithms with Application to Adaptive MIMO Channel Estimation

    PubMed Central

    Gui, Guan; Chen, Zhang-xin; Xu, Li; Wan, Qun; Huang, Jiyan; Adachi, Fumiyuki

    2014-01-01

    Channel estimation problem is one of the key technical issues in sparse frequency-selective fading multiple-input multiple-output (MIMO) communication systems using orthogonal frequency division multiplexing (OFDM) scheme. To estimate sparse MIMO channels, sparse invariable step-size normalized least mean square (ISS-NLMS) algorithms were applied to adaptive sparse channel estimation (ACSE). It is well known that step-size is a critical parameter which controls three aspects: algorithm stability, estimation performance, and computational cost. However, traditional methods are vulnerable to cause estimation performance loss because ISS cannot balance the three aspects simultaneously. In this paper, we propose two stable sparse variable step-size NLMS (VSS-NLMS) algorithms to improve the accuracy of MIMO channel estimators. First, ASCE is formulated in MIMO-OFDM systems. Second, different sparse penalties are introduced to VSS-NLMS algorithm for ASCE. In addition, difference between sparse ISS-NLMS algorithms and sparse VSS-NLMS ones is explained and their lower bounds are also derived. At last, to verify the effectiveness of the proposed algorithms for ASCE, several selected simulation results are shown to prove that the proposed sparse VSS-NLMS algorithms can achieve better estimation performance than the conventional methods via mean square error (MSE) and bit error rate (BER) metrics. PMID:25089286

  1. Margin based ontology sparse vector learning algorithm and applied in biology science.

    PubMed

    Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad

    2017-01-01

    In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  2. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, Darin P.; Badea, Cristian T., E-mail: cristian.badea@duke.edu; Lee, Chang-Lung

    Purpose: X-ray computed tomography (CT) is widely used, both clinically and preclinically, for fast, high-resolution anatomic imaging; however, compelling opportunities exist to expand its use in functional imaging applications. For instance, spectral information combined with nanoparticle contrast agents enables quantification of tissue perfusion levels, while temporal information details cardiac and respiratory dynamics. The authors propose and demonstrate a projection acquisition and reconstruction strategy for 5D CT (3D + dual energy + time) which recovers spectral and temporal information without substantially increasing radiation dose or sampling time relative to anatomic imaging protocols. Methods: The authors approach the 5D reconstruction problem withinmore » the framework of low-rank and sparse matrix decomposition. Unlike previous work on rank-sparsity constrained CT reconstruction, the authors establish an explicit rank-sparse signal model to describe the spectral and temporal dimensions. The spectral dimension is represented as a well-sampled time and energy averaged image plus regularly undersampled principal components describing the spectral contrast. The temporal dimension is represented as the same time and energy averaged reconstruction plus contiguous, spatially sparse, and irregularly sampled temporal contrast images. Using a nonlinear, image domain filtration approach, the authors refer to as rank-sparse kernel regression, the authors transfer image structure from the well-sampled time and energy averaged reconstruction to the spectral and temporal contrast images. This regularization strategy strictly constrains the reconstruction problem while approximately separating the temporal and spectral dimensions. Separability results in a highly compressed representation for the 5D data in which projections are shared between the temporal and spectral reconstruction subproblems, enabling substantial undersampling. The authors solved the 5D reconstruction problem using the split Bregman method and GPU-based implementations of backprojection, reprojection, and kernel regression. Using a preclinical mouse model, the authors apply the proposed algorithm to study myocardial injury following radiation treatment of breast cancer. Results: Quantitative 5D simulations are performed using the MOBY mouse phantom. Twenty data sets (ten cardiac phases, two energies) are reconstructed with 88 μm, isotropic voxels from 450 total projections acquired over a single 360° rotation. In vivo 5D myocardial injury data sets acquired in two mice injected with gold and iodine nanoparticles are also reconstructed with 20 data sets per mouse using the same acquisition parameters (dose: ∼60 mGy). For both the simulations and the in vivo data, the reconstruction quality is sufficient to perform material decomposition into gold and iodine maps to localize the extent of myocardial injury (gold accumulation) and to measure cardiac functional metrics (vascular iodine). Their 5D CT imaging protocol represents a 95% reduction in radiation dose per cardiac phase and energy and a 40-fold decrease in projection sampling time relative to their standard imaging protocol. Conclusions: Their 5D CT data acquisition and reconstruction protocol efficiently exploits the rank-sparse nature of spectral and temporal CT data to provide high-fidelity reconstruction results without increased radiation dose or sampling time.« less

  4. Spectrotemporal CT data acquisition and reconstruction at low dose

    PubMed Central

    Clark, Darin P.; Lee, Chang-Lung; Kirsch, David G.; Badea, Cristian T.

    2015-01-01

    Purpose: X-ray computed tomography (CT) is widely used, both clinically and preclinically, for fast, high-resolution anatomic imaging; however, compelling opportunities exist to expand its use in functional imaging applications. For instance, spectral information combined with nanoparticle contrast agents enables quantification of tissue perfusion levels, while temporal information details cardiac and respiratory dynamics. The authors propose and demonstrate a projection acquisition and reconstruction strategy for 5D CT (3D + dual energy + time) which recovers spectral and temporal information without substantially increasing radiation dose or sampling time relative to anatomic imaging protocols. Methods: The authors approach the 5D reconstruction problem within the framework of low-rank and sparse matrix decomposition. Unlike previous work on rank-sparsity constrained CT reconstruction, the authors establish an explicit rank-sparse signal model to describe the spectral and temporal dimensions. The spectral dimension is represented as a well-sampled time and energy averaged image plus regularly undersampled principal components describing the spectral contrast. The temporal dimension is represented as the same time and energy averaged reconstruction plus contiguous, spatially sparse, and irregularly sampled temporal contrast images. Using a nonlinear, image domain filtration approach, the authors refer to as rank-sparse kernel regression, the authors transfer image structure from the well-sampled time and energy averaged reconstruction to the spectral and temporal contrast images. This regularization strategy strictly constrains the reconstruction problem while approximately separating the temporal and spectral dimensions. Separability results in a highly compressed representation for the 5D data in which projections are shared between the temporal and spectral reconstruction subproblems, enabling substantial undersampling. The authors solved the 5D reconstruction problem using the split Bregman method and GPU-based implementations of backprojection, reprojection, and kernel regression. Using a preclinical mouse model, the authors apply the proposed algorithm to study myocardial injury following radiation treatment of breast cancer. Results: Quantitative 5D simulations are performed using the MOBY mouse phantom. Twenty data sets (ten cardiac phases, two energies) are reconstructed with 88 μm, isotropic voxels from 450 total projections acquired over a single 360° rotation. In vivo 5D myocardial injury data sets acquired in two mice injected with gold and iodine nanoparticles are also reconstructed with 20 data sets per mouse using the same acquisition parameters (dose: ∼60 mGy). For both the simulations and the in vivo data, the reconstruction quality is sufficient to perform material decomposition into gold and iodine maps to localize the extent of myocardial injury (gold accumulation) and to measure cardiac functional metrics (vascular iodine). Their 5D CT imaging protocol represents a 95% reduction in radiation dose per cardiac phase and energy and a 40-fold decrease in projection sampling time relative to their standard imaging protocol. Conclusions: Their 5D CT data acquisition and reconstruction protocol efficiently exploits the rank-sparse nature of spectral and temporal CT data to provide high-fidelity reconstruction results without increased radiation dose or sampling time. PMID:26520724

  5. Probabilistic Air Segmentation and Sparse Regression Estimated Pseudo CT for PET/MR Attenuation Correction

    PubMed Central

    Chen, Yasheng; Juttukonda, Meher; Su, Yi; Benzinger, Tammie; Rubin, Brian G.; Lee, Yueh Z.; Lin, Weili; Shen, Dinggang; Lalush, David

    2015-01-01

    Purpose To develop a positron emission tomography (PET) attenuation correction method for brain PET/magnetic resonance (MR) imaging by estimating pseudo computed tomographic (CT) images from T1-weighted MR and atlas CT images. Materials and Methods In this institutional review board–approved and HIPAA-compliant study, PET/MR/CT images were acquired in 20 subjects after obtaining written consent. A probabilistic air segmentation and sparse regression (PASSR) method was developed for pseudo CT estimation. Air segmentation was performed with assistance from a probabilistic air map. For nonair regions, the pseudo CT numbers were estimated via sparse regression by using atlas MR patches. The mean absolute percentage error (MAPE) on PET images was computed as the normalized mean absolute difference in PET signal intensity between a method and the reference standard continuous CT attenuation correction method. Friedman analysis of variance and Wilcoxon matched-pairs tests were performed for statistical comparison of MAPE between the PASSR method and Dixon segmentation, CT segmentation, and population averaged CT atlas (mean atlas) methods. Results The PASSR method yielded a mean MAPE ± standard deviation of 2.42% ± 1.0, 3.28% ± 0.93, and 2.16% ± 1.75, respectively, in the whole brain, gray matter, and white matter, which were significantly lower than the Dixon, CT segmentation, and mean atlas values (P < .01). Moreover, 68.0% ± 16.5, 85.8% ± 12.9, and 96.0% ± 2.5 of whole-brain volume had within ±2%, ±5%, and ±10% percentage error by using PASSR, respectively, which was significantly higher than other methods (P < .01). Conclusion PASSR outperformed the Dixon, CT segmentation, and mean atlas methods by reducing PET error owing to attenuation correction. © RSNA, 2014 PMID:25521778

  6. Sparse signal representation and its applications in ultrasonic NDE.

    PubMed

    Zhang, Guang-Ming; Zhang, Cheng-Zhong; Harvey, David M

    2012-03-01

    Many sparse signal representation (SSR) algorithms have been developed in the past decade. The advantages of SSR such as compact representations and super resolution lead to the state of the art performance of SSR for processing ultrasonic non-destructive evaluation (NDE) signals. Choosing a suitable SSR algorithm and designing an appropriate overcomplete dictionary is a key for success. After a brief review of sparse signal representation methods and the design of overcomplete dictionaries, this paper addresses the recent accomplishments of SSR for processing ultrasonic NDE signals. The advantages and limitations of SSR algorithms and various overcomplete dictionaries widely-used in ultrasonic NDE applications are explored in depth. Their performance improvement compared to conventional signal processing methods in many applications such as ultrasonic flaw detection and noise suppression, echo separation and echo estimation, and ultrasonic imaging is investigated. The challenging issues met in practical ultrasonic NDE applications for example the design of a good dictionary are discussed. Representative experimental results are presented for demonstration. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. Sparse filtering with the generalized lp/lq norm and its applications to the condition monitoring of rotating machinery

    NASA Astrophysics Data System (ADS)

    Jia, Xiaodong; Zhao, Ming; Di, Yuan; Li, Pin; Lee, Jay

    2018-03-01

    Sparsity is becoming a more and more important topic in the area of machine learning and signal processing recently. One big family of sparse measures in current literature is the generalized lp /lq norm, which is scale invariant and is widely regarded as normalized lp norm. However, the characteristics of the generalized lp /lq norm are still less discussed and its application to the condition monitoring of rotating devices has been still unexplored. In this study, we firstly discuss the characteristics of the generalized lp /lq norm for sparse optimization and then propose a method of sparse filtering with the generalized lp /lq norm for the purpose of impulsive signature enhancement. Further driven by the trend of industrial big data and the need of reducing maintenance cost for industrial equipment, the proposed sparse filter is customized for vibration signal processing and also implemented on bearing and gearbox for the purpose of condition monitoring. Based on the results from the industrial implementations in this paper, the proposed method has been found to be a promising tool for impulsive feature enhancement, and the superiority of the proposed method over previous methods is also demonstrated.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chow, Edmond

    Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.

  9. Partitioned learning of deep Boltzmann machines for SNP data.

    PubMed

    Hess, Moritz; Lenz, Stefan; Blätte, Tamara J; Bullinger, Lars; Binder, Harald

    2017-10-15

    Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen the joint distribution of SNPs, followed by training several DBMs on SNP partitions that were identified by the screening. Aggregate features representing SNP patterns and the corresponding SNPs are extracted from the DBMs by a combination of statistical tests and sparse regression. In simulated case-control data, we show how this can uncover complex SNP patterns and augment results from univariate approaches, while maintaining type 1 error control. Time-to-event endpoints are considered in an application with acute myeloid leukemia patients, where SNP patterns are modeled after a pre-screening based on gene expression data. The proposed approach identified three SNPs that seem to jointly influence survival in a validation dataset. This indicates the added value of jointly investigating SNPs compared to standard univariate analyses and makes partitioned learning of DBMs an interesting complementary approach when analyzing SNP data. A Julia package is provided at 'http://github.com/binderh/BoltzmannMachines.jl'. binderh@imbi.uni-freiburg.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. Semi-implicit integration factor methods on sparse grids for high-dimensional systems

    NASA Astrophysics Data System (ADS)

    Wang, Dongyong; Chen, Weitao; Nie, Qing

    2015-07-01

    Numerical methods for partial differential equations in high-dimensional spaces are often limited by the curse of dimensionality. Though the sparse grid technique, based on a one-dimensional hierarchical basis through tensor products, is popular for handling challenges such as those associated with spatial discretization, the stability conditions on time step size due to temporal discretization, such as those associated with high-order derivatives in space and stiff reactions, remain. Here, we incorporate the sparse grids with the implicit integration factor method (IIF) that is advantageous in terms of stability conditions for systems containing stiff reactions and diffusions. We combine IIF, in which the reaction is treated implicitly and the diffusion is treated explicitly and exactly, with various sparse grid techniques based on the finite element and finite difference methods and a multi-level combination approach. The overall method is found to be efficient in terms of both storage and computational time for solving a wide range of PDEs in high dimensions. In particular, the IIF with the sparse grid combination technique is flexible and effective in solving systems that may include cross-derivatives and non-constant diffusion coefficients. Extensive numerical simulations in both linear and nonlinear systems in high dimensions, along with applications of diffusive logistic equations and Fokker-Planck equations, demonstrate the accuracy, efficiency, and robustness of the new methods, indicating potential broad applications of the sparse grid-based integration factor method.

  11. Unconditional or Conditional Logistic Regression Model for Age-Matched Case-Control Data?

    PubMed

    Kuo, Chia-Ling; Duan, Yinghui; Grady, James

    2018-01-01

    Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.

  12. Unconditional or Conditional Logistic Regression Model for Age-Matched Case–Control Data?

    PubMed Central

    Kuo, Chia-Ling; Duan, Yinghui; Grady, James

    2018-01-01

    Matching on demographic variables is commonly used in case–control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case–control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case–control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls. PMID:29552553

  13. Sparse representation of multi parametric DCE-MRI features using K-SVD for classifying gene expression based breast cancer recurrence risk

    NASA Astrophysics Data System (ADS)

    Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina

    2014-03-01

    We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.

  14. Using an Android application to assess registration strategies in open hepatic procedures: a planning and simulation tool

    NASA Astrophysics Data System (ADS)

    Doss, Derek J.; Heiselman, Jon S.; Collins, Jarrod A.; Weis, Jared A.; Clements, Logan W.; Geevarghese, Sunil K.; Miga, Michael I.

    2017-03-01

    Sparse surface digitization with an optically tracked stylus for use in an organ surface-based image-to-physical registration is an established approach for image-guided open liver surgery procedures. However, variability in sparse data collections during open hepatic procedures can produce disparity in registration alignments. In part, this variability arises from inconsistencies with the patterns and fidelity of collected intraoperative data. The liver lacks distinct landmarks and experiences considerable soft tissue deformation. Furthermore, data coverage of the organ is often incomplete or unevenly distributed. While more robust feature-based registration methodologies have been developed for image-guided liver surgery, it is still unclear how variation in sparse intraoperative data affects registration. In this work, we have developed an application to allow surgeons to study the performance of surface digitization patterns on registration. Given the intrinsic nature of soft-tissue, we incorporate realistic organ deformation when assessing fidelity of a rigid registration methodology. We report the construction of our application and preliminary registration results using four participants. Our preliminary results indicate that registration quality improves as users acquire more experience selecting patterns of sparse intraoperative surface data.

  15. Porosity estimation by semi-supervised learning with sparsely available labeled samples

    NASA Astrophysics Data System (ADS)

    Lima, Luiz Alberto; Görnitz, Nico; Varella, Luiz Eduardo; Vellasco, Marley; Müller, Klaus-Robert; Nakajima, Shinichi

    2017-09-01

    This paper addresses the porosity estimation problem from seismic impedance volumes and porosity samples located in a small group of exploratory wells. Regression methods, trained on the impedance as inputs and the porosity as output labels, generally suffer from extremely expensive (and hence sparsely available) porosity samples. To optimally make use of the valuable porosity data, a semi-supervised machine learning method was proposed, Transductive Conditional Random Field Regression (TCRFR), showing good performance (Görnitz et al., 2017). TCRFR, however, still requires more labeled data than those usually available, which creates a gap when applying the method to the porosity estimation problem in realistic situations. In this paper, we aim to fill this gap by introducing two graph-based preprocessing techniques, which adapt the original TCRFR for extremely weakly supervised scenarios. Our new method outperforms the previous automatic estimation methods on synthetic data and provides a comparable result to the manual labored, time-consuming geostatistics approach on real data, proving its potential as a practical industrial tool.

  16. NITPICK: peak identification for mass spectrometry data.

    PubMed

    Renard, Bernhard Y; Kirchner, Marc; Steen, Hanno; Steen, Judith A J; Hamprecht, Fred A

    2008-08-28

    The reliable extraction of features from mass spectra is a fundamental step in the automated analysis of proteomic mass spectrometry (MS) experiments. This contribution proposes a sparse template regression approach to peak picking called NITPICK. NITPICK is a Non-greedy, Iterative Template-based peak PICKer that deconvolves complex overlapping isotope distributions in multicomponent mass spectra. NITPICK is based on fractional averaging, a novel extension to Senko's well-known averaging model, and on a modified version of sparse, non-negative least angle regression, for which a suitable, statistically motivated early stopping criterion has been derived. The strength of NITPICK is the deconvolution of overlapping mixture mass spectra. Extensive comparative evaluation has been carried out and results are provided for simulated and real-world data sets. NITPICK outperforms pepex, to date the only alternate, publicly available, non-greedy feature extraction routine. NITPICK is available as software package for the R programming language and can be downloaded from (http://hci.iwr.uni-heidelberg.de/mip/proteomics/).

  17. Accelerated High-Dimensional MR Imaging with Sparse Sampling Using Low-Rank Tensors

    PubMed Central

    He, Jingfei; Liu, Qiegen; Christodoulou, Anthony G.; Ma, Chao; Lam, Fan

    2017-01-01

    High-dimensional MR imaging often requires long data acquisition time, thereby limiting its practical applications. This paper presents a low-rank tensor based method for accelerated high-dimensional MR imaging using sparse sampling. This method represents high-dimensional images as low-rank tensors (or partially separable functions) and uses this mathematical structure for sparse sampling of the data space and for image reconstruction from highly undersampled data. More specifically, the proposed method acquires two datasets with complementary sampling patterns, one for subspace estimation and the other for image reconstruction; image reconstruction from highly undersampled data is accomplished by fitting the measured data with a sparsity constraint on the core tensor and a group sparsity constraint on the spatial coefficients jointly using the alternating direction method of multipliers. The usefulness of the proposed method is demonstrated in MRI applications; it may also have applications beyond MRI. PMID:27093543

  18. Insights from Classifying Visual Concepts with Multiple Kernel Learning

    PubMed Central

    Binder, Alexander; Nakajima, Shinichi; Kloft, Marius; Müller, Christina; Samek, Wojciech; Brefeld, Ulf; Müller, Klaus-Robert; Kawanabe, Motoaki

    2012-01-01

    Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, 1-norm regularized MKL variants are often observed to be outperformed by an unweighted sum kernel. The main contributions of this paper are the following: we apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks from the application domain of computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum-kernel SVM and sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. Data sets (kernel matrices) as well as further information are available at http://doc.ml.tu-berlin.de/image_mkl/(Accessed 2012 Jun 25). PMID:22936970

  19. Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA) for Lp-Regularization Using the Multiple Sub-Dictionary Representation

    PubMed Central

    Zhang, Jie; Fan, Shangang; Xiong, Jian; Cheng, Xiefeng; Sari, Hikmet; Adachi, Fumiyuki

    2017-01-01

    Both L1/2 and L2/3 are two typical non-convex regularizations of Lp (0

  20. Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA) for Lp-Regularization Using the Multiple Sub-Dictionary Representation.

    PubMed

    Li, Yunyi; Zhang, Jie; Fan, Shangang; Yang, Jie; Xiong, Jian; Cheng, Xiefeng; Sari, Hikmet; Adachi, Fumiyuki; Gui, Guan

    2017-12-15

    Both L 1/2 and L 2/3 are two typical non-convex regularizations of L p (0

  1. Probabilistic Low-Rank Multitask Learning.

    PubMed

    Kong, Yu; Shao, Ming; Li, Kang; Fu, Yun

    2018-03-01

    In this paper, we consider the problem of learning multiple related tasks simultaneously with the goal of improving the generalization performance of individual tasks. The key challenge is to effectively exploit the shared information across multiple tasks as well as preserve the discriminative information for each individual task. To address this, we propose a novel probabilistic model for multitask learning (MTL) that can automatically balance between low-rank and sparsity constraints. The former assumes a low-rank structure of the underlying predictive hypothesis space to explicitly capture the relationship of different tasks and the latter learns the incoherent sparse patterns private to each task. We derive and perform inference via variational Bayesian methods. Experimental results on both regression and classification tasks on real-world applications demonstrate the effectiveness of the proposed method in dealing with the MTL problems.

  2. Men's Alcohol Expectancies at Selected Community Colleges

    ERIC Educational Resources Information Center

    Derby, Dustin C.

    2011-01-01

    Men's alcohol expectancies are an important cognitive-behavioral component of their consumption; yet, sparse research details such behaviors for men in two-year colleges. Selected for inclusion with the current study were 563 men from seven Illinois community colleges. Logistic regression analysis indicated four significant, positive relationships…

  3. Selection of Polynomial Chaos Bases via Bayesian Model Uncertainty Methods with Applications to Sparse Approximation of PDEs with Stochastic Inputs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karagiannis, Georgios; Lin, Guang

    2014-02-15

    Generalized polynomial chaos (gPC) expansions allow the representation of the solution of a stochastic system as a series of polynomial terms. The number of gPC terms increases dramatically with the dimension of the random input variables. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs if the evaluations of the system are expensive, the evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solution, both in spacial and random domains, by coupling Bayesianmore » model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spacial points via (1) Bayesian model average or (2) medial probability model, and their construction as functions on the spacial domain via spline interpolation. The former accounts the model uncertainty and provides Bayes-optimal predictions; while the latter, additionally, provides a sparse representation of the solution by evaluating the expansion on a subset of dominating gPC bases when represented as a gPC expansion. Moreover, the method quantifies the importance of the gPC bases through inclusion probabilities. We design an MCMC sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed method is suitable for, but not restricted to, problems whose stochastic solution is sparse at the stochastic level with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the good performance of the proposed method and make comparisons with others on 1D, 14D and 40D in random space elliptic stochastic partial differential equations.« less

  4. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    NASA Astrophysics Data System (ADS)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  5. TH-AB-202-08: A Robust Real-Time Surface Reconstruction Method On Point Clouds Captured From a 3D Surface Photogrammetry System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, W; Sawant, A; Ruan, D

    2016-06-15

    Purpose: Surface photogrammetry (e.g. VisionRT, C-Rad) provides a noninvasive way to obtain high-frequency measurement for patient motion monitoring in radiotherapy. This work aims to develop a real-time surface reconstruction method on the acquired point clouds, whose acquisitions are subject to noise and missing measurements. In contrast to existing surface reconstruction methods that are usually computationally expensive, the proposed method reconstructs continuous surfaces with comparable accuracy in real-time. Methods: The key idea in our method is to solve and propagate a sparse linear relationship from the point cloud (measurement) manifold to the surface (reconstruction) manifold, taking advantage of the similarity inmore » local geometric topology in both manifolds. With consistent point cloud acquisition, we propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, building the point correspondences by the iterative closest point (ICP) method. To accommodate changing noise levels and/or presence of inconsistent occlusions, we further propose a modified sparse regression (MSR) model to account for the large and sparse error built by ICP, with a Laplacian prior. We evaluated our method on both clinical acquired point clouds under consistent conditions and simulated point clouds with inconsistent occlusions. The reconstruction accuracy was evaluated w.r.t. root-mean-squared-error, by comparing the reconstructed surfaces against those from the variational reconstruction method. Results: On clinical point clouds, both the SR and MSR models achieved sub-millimeter accuracy, with mean reconstruction time reduced from 82.23 seconds to 0.52 seconds and 0.94 seconds, respectively. On simulated point cloud with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent performance despite the introduced occlusions. Conclusion: We have developed a real-time and robust surface reconstruction method on point clouds acquired by photogrammetry systems. It serves an important enabling step for real-time motion tracking in radiotherapy. This work is supported in part by NIH grant R01 CA169102-02.« less

  6. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system.

    PubMed

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-05-01

    To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.

  7. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    PubMed Central

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-01-01

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications. PMID:27147347

  8. BI-sparsity pursuit for robust subspace recovery

    DOE PAGES

    Bian, Xiao; Krim, Hamid

    2015-09-01

    Here, the success of sparse models in computer vision and machine learning in many real-world applications, may be attributed in large part, to the fact that many high dimensional data are distributed in a union of low dimensional subspaces. The underlying structure may, however, be adversely affected by sparse errors, thus inducing additional complexity in recovering it. In this paper, we propose a bi-sparse model as a framework to investigate and analyze this problem, and provide as a result , a novel algorithm to recover the union of subspaces in presence of sparse corruptions. We additionally demonstrate the effectiveness ofmore » our method by experiments on real-world vision data.« less

  9. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-05-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.

  10. Bayesian Analysis of High Dimensional Classification

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  11. A Fast and Robust Extrinsic Calibration for RGB-D Camera Networks.

    PubMed

    Su, Po-Chang; Shen, Ju; Xu, Wanxin; Cheung, Sen-Ching S; Luo, Ying

    2018-01-15

    From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. Practical applications often use sparsely-placed cameras to maximize visibility, while using as few cameras as possible to minimize cost. In general, it is challenging to calibrate sparse camera networks due to the lack of shared scene features across different camera views. In this paper, we propose a novel algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Our work has a number of novel features. First, to cope with the wide separation between different cameras, we establish view correspondences by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic calibration using rigid transformation, which is optimal only for pinhole cameras, we systematically test different view transformation functions including rigid transformation, polynomial transformation and manifold regression to determine the most robust mapping that generalizes well to unseen data. Third, we reformulate the celebrated bundle adjustment procedure to minimize the global 3D reprojection error so as to fine-tune the initial estimates. Finally, our scalable client-server architecture is computationally efficient: the calibration of a five-camera system, including data capture, can be done in minutes using only commodity PCs. Our proposed framework is compared with other state-of-the-arts systems using both quantitative measurements and visual alignment results of the merged point clouds.

  12. A Fast and Robust Extrinsic Calibration for RGB-D Camera Networks †

    PubMed Central

    Shen, Ju; Xu, Wanxin; Luo, Ying

    2018-01-01

    From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. Practical applications often use sparsely-placed cameras to maximize visibility, while using as few cameras as possible to minimize cost. In general, it is challenging to calibrate sparse camera networks due to the lack of shared scene features across different camera views. In this paper, we propose a novel algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Our work has a number of novel features. First, to cope with the wide separation between different cameras, we establish view correspondences by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic calibration using rigid transformation, which is optimal only for pinhole cameras, we systematically test different view transformation functions including rigid transformation, polynomial transformation and manifold regression to determine the most robust mapping that generalizes well to unseen data. Third, we reformulate the celebrated bundle adjustment procedure to minimize the global 3D reprojection error so as to fine-tune the initial estimates. Finally, our scalable client-server architecture is computationally efficient: the calibration of a five-camera system, including data capture, can be done in minutes using only commodity PCs. Our proposed framework is compared with other state-of-the-arts systems using both quantitative measurements and visual alignment results of the merged point clouds. PMID:29342968

  13. Feature Selection and Pedestrian Detection Based on Sparse Representation.

    PubMed

    Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei

    2015-01-01

    Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.

  14. Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations

    DOE PAGES

    Plantenga, Todd; Kolda, Tamara G.; Hansen, Samantha

    2015-04-30

    Tensor factorizations with nonnegativity constraints have found application in analysing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g. count data), which leads to sparse tensors that can be modelled by sparse factor matrices. In this paper, we investigate efficient techniques for computing an appropriate canonical polyadic tensor factorization based on the Kullback–Leibler divergence function. We propose novel subproblem solvers within the standard alternating block variable approach. Our new methods exploit structure and reformulate the optimization problem as small independent subproblems. We employ bound-constrained Newton andmore » quasi-Newton methods. Finally, we compare our algorithms against other codes, demonstrating superior speed for high accuracy results and the ability to quickly find sparse solutions.« less

  15. Data Shared Lasso: A Novel Tool to Discover Uplift.

    PubMed

    Gross, Samuel M; Tibshirani, Robert

    2016-09-01

    A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card.

  16. Data Shared Lasso: A Novel Tool to Discover Uplift

    PubMed Central

    Gross, Samuel M.; Tibshirani, Robert

    2017-01-01

    A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card. PMID:29056802

  17. Application of a sparse representation method using K-SVD to data compression of experimental ambient vibration data for SHM

    NASA Astrophysics Data System (ADS)

    Noh, Hae Young; Kiremidjian, Anne S.

    2011-04-01

    This paper introduces a data compression method using the K-SVD algorithm and its application to experimental ambient vibration data for structural health monitoring purposes. Because many damage diagnosis algorithms that use system identification require vibration measurements of multiple locations, it is necessary to transmit long threads of data. In wireless sensor networks for structural health monitoring, however, data transmission is often a major source of battery consumption. Therefore, reducing the amount of data to transmit can significantly lengthen the battery life and reduce maintenance cost. The K-SVD algorithm was originally developed in information theory for sparse signal representation. This algorithm creates an optimal over-complete set of bases, referred to as a dictionary, using singular value decomposition (SVD) and represents the data as sparse linear combinations of these bases using the orthogonal matching pursuit (OMP) algorithm. Since ambient vibration data are stationary, we can segment them and represent each segment sparsely. Then only the dictionary and the sparse vectors of the coefficients need to be transmitted wirelessly for restoration of the original data. We applied this method to ambient vibration data measured from a four-story steel moment resisting frame. The results show that the method can compress the data efficiently and restore the data with very little error.

  18. The application of low-rank and sparse decomposition method in the field of climatology

    NASA Astrophysics Data System (ADS)

    Gupta, Nitika; Bhaskaran, Prasad K.

    2018-04-01

    The present study reports a low-rank and sparse decomposition method that separates the mean and the variability of a climate data field. Until now, the application of this technique was limited only in areas such as image processing, web data ranking, and bioinformatics data analysis. In climate science, this method exactly separates the original data into a set of low-rank and sparse components, wherein the low-rank components depict the linearly correlated dataset (expected or mean behavior), and the sparse component represents the variation or perturbation in the dataset from its mean behavior. The study attempts to verify the efficacy of this proposed technique in the field of climatology with two examples of real world. The first example attempts this technique on the maximum wind-speed (MWS) data for the Indian Ocean (IO) region. The study brings to light a decadal reversal pattern in the MWS for the North Indian Ocean (NIO) during the months of June, July, and August (JJA). The second example deals with the sea surface temperature (SST) data for the Bay of Bengal region that exhibits a distinct pattern in the sparse component. The study highlights the importance of the proposed technique used for interpretation and visualization of climate data.

  19. Orthogonal Procrustes Analysis for Dictionary Learning in Sparse Linear Representation.

    PubMed

    Grossi, Giuliano; Lanzarotti, Raffaella; Lin, Jianyi

    2017-01-01

    In the sparse representation model, the design of overcomplete dictionaries plays a key role for the effectiveness and applicability in different domains. Recent research has produced several dictionary learning approaches, being proven that dictionaries learnt by data examples significantly outperform structured ones, e.g. wavelet transforms. In this context, learning consists in adapting the dictionary atoms to a set of training signals in order to promote a sparse representation that minimizes the reconstruction error. Finding the best fitting dictionary remains a very difficult task, leaving the question still open. A well-established heuristic method for tackling this problem is an iterative alternating scheme, adopted for instance in the well-known K-SVD algorithm. Essentially, it consists in repeating two stages; the former promotes sparse coding of the training set and the latter adapts the dictionary to reduce the error. In this paper we present R-SVD, a new method that, while maintaining the alternating scheme, adopts the Orthogonal Procrustes analysis to update the dictionary atoms suitably arranged into groups. Comparative experiments on synthetic data prove the effectiveness of R-SVD with respect to well known dictionary learning algorithms such as K-SVD, ILS-DLA and the online method OSDL. Moreover, experiments on natural data such as ECG compression, EEG sparse representation, and image modeling confirm R-SVD's robustness and wide applicability.

  20. Tactical 3D Model Generation using Structure-From-Motion on Video from Unmanned Systems

    DTIC Science & Technology

    2015-04-01

    available SfM application known as VisualSFM .6,7 VisualSFM is an end-user, “off-the-shelf” implementation of SfM that is easy to configure and used for...most 3D model generation applications from imagery. While the usual interface with VisualSFM is through their graphical user interface (GUI), we will be...of our system.5 There are two types of 3D model generation available within VisualSFM ; sparse and dense reconstruction. Sparse reconstruction begins

  1. Implicit kernel sparse shape representation: a sparse-neighbors-based objection segmentation framework.

    PubMed

    Yao, Jincao; Yu, Huimin; Hu, Roland

    2017-01-01

    This paper introduces a new implicit-kernel-sparse-shape-representation-based object segmentation framework. Given an input object whose shape is similar to some of the elements in the training set, the proposed model can automatically find a cluster of implicit kernel sparse neighbors to approximately represent the input shape and guide the segmentation. A distance-constrained probabilistic definition together with a dualization energy term is developed to connect high-level shape representation and low-level image information. We theoretically prove that our model not only derives from two projected convex sets but is also equivalent to a sparse-reconstruction-error-based representation in the Hilbert space. Finally, a "wake-sleep"-based segmentation framework is applied to drive the evolutionary curve to recover the original shape of the object. We test our model on two public datasets. Numerical experiments on both synthetic images and real applications show the superior capabilities of the proposed framework.

  2. Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less

  3. Tensor Sparse Coding for Positive Definite Matrices.

    PubMed

    Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikos

    2013-08-02

    In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for e.g., image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.

  4. Tensor sparse coding for positive definite matrices.

    PubMed

    Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos

    2014-03-01

    In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for example, image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.

  5. Sparse Matrix for ECG Identification with Two-Lead Features.

    PubMed

    Tseng, Kuo-Kun; Luo, Jiao; Hegarty, Robert; Wang, Wenmin; Haiting, Dong

    2015-01-01

    Electrocardiograph (ECG) human identification has the potential to improve biometric security. However, improvements in ECG identification and feature extraction are required. Previous work has focused on single lead ECG signals. Our work proposes a new algorithm for human identification by mapping two-lead ECG signals onto a two-dimensional matrix then employing a sparse matrix method to process the matrix. And that is the first application of sparse matrix techniques for ECG identification. Moreover, the results of our experiments demonstrate the benefits of our approach over existing methods.

  6. Application of Compressive Sensing to Gravitational Microlensing Data and Implications for Miniaturized Space Observatories

    NASA Technical Reports Server (NTRS)

    Korde-Patel, Asmita (Inventor); Barry, Richard K.; Mohsenin, Tinoosh

    2016-01-01

    Compressive Sensing is a technique for simultaneous acquisition and compression of data that is sparse or can be made sparse in some domain. It is currently under intense development and has been profitably employed for industrial and medical applications. We here describe the use of this technique for the processing of astronomical data. We outline the procedure as applied to exoplanet gravitational microlensing and analyze measurement results and uncertainty values. We describe implications for on-spacecraft data processing for space observatories. Our findings suggest that application of these techniques may yield significant, enabling benefits especially for power and volume-limited space applications such as miniaturized or micro-constellation satellites.

  7. Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

    ERIC Educational Resources Information Center

    Lin, Chih-Kai

    2017-01-01

    Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

  8. On A Nonlinear Generalization of Sparse Coding and Dictionary Learning.

    PubMed

    Xie, Yuchen; Ho, Jeffrey; Vemuri, Baba

    2013-01-01

    Existing dictionary learning algorithms are based on the assumption that the data are vectors in an Euclidean vector space ℝ d , and the dictionary is learned from the training data using the vector space structure of ℝ d and its Euclidean L 2 -metric. However, in many applications, features and data often originated from a Riemannian manifold that does not support a global linear (vector space) structure. Furthermore, the extrinsic viewpoint of existing dictionary learning algorithms becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to the application. This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here. We show that both the dictionary and sparse coding can be effectively computed for several important classes of Riemannian manifolds, and we validate the proposed method using two well-known classification problems in computer vision and medical imaging analysis.

  9. On A Nonlinear Generalization of Sparse Coding and Dictionary Learning

    PubMed Central

    Xie, Yuchen; Ho, Jeffrey; Vemuri, Baba

    2013-01-01

    Existing dictionary learning algorithms are based on the assumption that the data are vectors in an Euclidean vector space ℝd, and the dictionary is learned from the training data using the vector space structure of ℝd and its Euclidean L2-metric. However, in many applications, features and data often originated from a Riemannian manifold that does not support a global linear (vector space) structure. Furthermore, the extrinsic viewpoint of existing dictionary learning algorithms becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to the application. This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here. We show that both the dictionary and sparse coding can be effectively computed for several important classes of Riemannian manifolds, and we validate the proposed method using two well-known classification problems in computer vision and medical imaging analysis. PMID:24129583

  10. iSAP: Interactive Sparse Astronomical Data Analysis Packages

    NASA Astrophysics Data System (ADS)

    Fourt, O.; Starck, J.-L.; Sureau, F.; Bobin, J.; Moudden, Y.; Abrial, P.; Schmitt, J.

    2013-03-01

    iSAP consists of three programs, written in IDL, which together are useful for spherical data analysis. MR/S (MultiResolution on the Sphere) contains routines for wavelet, ridgelet and curvelet transform on the sphere, and applications such denoising on the sphere using wavelets and/or curvelets, Gaussianity tests and Independent Component Analysis on the Sphere. MR/S has been designed for the PLANCK project, but can be used for many other applications. SparsePol (Polarized Spherical Wavelets and Curvelets) has routines for polarized wavelet, polarized ridgelet and polarized curvelet transform on the sphere, and applications such denoising on the sphere using wavelets and/or curvelets, Gaussianity tests and blind source separation on the Sphere. SparsePol has been designed for the PLANCK project. MS-VSTS (Multi-Scale Variance Stabilizing Transform on the Sphere), designed initially for the FERMI project, is useful for spherical mono-channel and multi-channel data analysis when the data are contaminated by a Poisson noise. It contains routines for wavelet/curvelet denoising, wavelet deconvolution, multichannel wavelet denoising and deconvolution.

  11. Quality optimization of H.264/AVC video transmission over noisy environments using a sparse regression framework

    NASA Astrophysics Data System (ADS)

    Pandremmenou, K.; Tziortziotis, N.; Paluri, S.; Zhang, W.; Blekas, K.; Kondi, L. P.; Kumar, S.

    2015-03-01

    We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an e ort to improve LASSO's prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features. les that typically contain high-entropy data, producing a footprint that is far less conspicuous than existing methods. The system uses a local web server to provide a le system, user interface and applications through an web architecture.

  12. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery

    PubMed Central

    Huo, Zhiguang; Tseng, George

    2017-01-01

    Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is-K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency. PMID:28959370

  13. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

    PubMed

    Huo, Zhiguang; Tseng, George

    2017-06-01

    Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K -means (is- K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is- K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.

  14. Sparse-view photoacoustic tomography using virtual parallel-projections and spatially adaptive filtering

    NASA Astrophysics Data System (ADS)

    Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng

    2018-02-01

    To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.

  15. Predictors of Asian American Adolescents' Suicide Attempts: A Latent Class Regression Analysis

    ERIC Educational Resources Information Center

    Wong, Y. Joel; Maffini, Cara S.

    2011-01-01

    Although suicide-related outcomes among Asian American adolescents are a serious public health problem in the United States, research in this area has been relatively sparse. To address this gap in the empirical literature, this study examined subgroups of Asian American adolescents for whom family, school, and peer relationships exerted…

  16. Learning dictionaries of sparse codes of 3D movements of body joints for real-time human activity understanding.

    PubMed

    Qi, Jin; Yang, Zhiyong

    2014-01-01

    Real-time human activity recognition is essential for human-robot interactions for assisted healthy independent living. Most previous work in this area is performed on traditional two-dimensional (2D) videos and both global and local methods have been used. Since 2D videos are sensitive to changes of lighting condition, view angle, and scale, researchers begun to explore applications of 3D information in human activity understanding in recently years. Unfortunately, features that work well on 2D videos usually don't perform well on 3D videos and there is no consensus on what 3D features should be used. Here we propose a model of human activity recognition based on 3D movements of body joints. Our method has three steps, learning dictionaries of sparse codes of 3D movements of joints, sparse coding, and classification. In the first step, space-time volumes of 3D movements of body joints are obtained via dense sampling and independent component analysis is then performed to construct a dictionary of sparse codes for each activity. In the second step, the space-time volumes are projected to the dictionaries and a set of sparse histograms of the projection coefficients are constructed as feature representations of the activities. Finally, the sparse histograms are used as inputs to a support vector machine to recognize human activities. We tested this model on three databases of human activities and found that it outperforms the state-of-the-art algorithms. Thus, this model can be used for real-time human activity recognition in many applications.

  17. Visual saliency detection based on in-depth analysis of sparse representation

    NASA Astrophysics Data System (ADS)

    Wang, Xin; Shen, Siqiu; Ning, Chen

    2018-03-01

    Visual saliency detection has been receiving great attention in recent years since it can facilitate a wide range of applications in computer vision. A variety of saliency models have been proposed based on different assumptions within which saliency detection via sparse representation is one of the newly arisen approaches. However, most existing sparse representation-based saliency detection methods utilize partial characteristics of sparse representation, lacking of in-depth analysis. Thus, they may have limited detection performance. Motivated by this, this paper proposes an algorithm for detecting visual saliency based on in-depth analysis of sparse representation. A number of discriminative dictionaries are first learned with randomly sampled image patches by means of inner product-based dictionary atom classification. Then, the input image is partitioned into many image patches, and these patches are classified into salient and nonsalient ones based on the in-depth analysis of sparse coding coefficients. Afterward, sparse reconstruction errors are calculated for the salient and nonsalient patch sets. By investigating the sparse reconstruction errors, the most salient atoms, which tend to be from the most salient region, are screened out and taken away from the discriminative dictionaries. Finally, an effective method is exploited for saliency map generation with the reduced dictionaries. Comprehensive evaluations on publicly available datasets and comparisons with some state-of-the-art approaches demonstrate the effectiveness of the proposed algorithm.

  18. Orthogonal Procrustes Analysis for Dictionary Learning in Sparse Linear Representation

    PubMed Central

    Grossi, Giuliano; Lin, Jianyi

    2017-01-01

    In the sparse representation model, the design of overcomplete dictionaries plays a key role for the effectiveness and applicability in different domains. Recent research has produced several dictionary learning approaches, being proven that dictionaries learnt by data examples significantly outperform structured ones, e.g. wavelet transforms. In this context, learning consists in adapting the dictionary atoms to a set of training signals in order to promote a sparse representation that minimizes the reconstruction error. Finding the best fitting dictionary remains a very difficult task, leaving the question still open. A well-established heuristic method for tackling this problem is an iterative alternating scheme, adopted for instance in the well-known K-SVD algorithm. Essentially, it consists in repeating two stages; the former promotes sparse coding of the training set and the latter adapts the dictionary to reduce the error. In this paper we present R-SVD, a new method that, while maintaining the alternating scheme, adopts the Orthogonal Procrustes analysis to update the dictionary atoms suitably arranged into groups. Comparative experiments on synthetic data prove the effectiveness of R-SVD with respect to well known dictionary learning algorithms such as K-SVD, ILS-DLA and the online method OSDL. Moreover, experiments on natural data such as ECG compression, EEG sparse representation, and image modeling confirm R-SVD’s robustness and wide applicability. PMID:28103283

  19. An efficient classification method based on principal component and sparse representation.

    PubMed

    Zhai, Lin; Fu, Shujun; Zhang, Caiming; Liu, Yunxian; Wang, Lu; Liu, Guohua; Yang, Mingqiang

    2016-01-01

    As an important application in optical imaging, palmprint recognition is interfered by many unfavorable factors. An effective fusion of blockwise bi-directional two-dimensional principal component analysis and grouping sparse classification is presented. The dimension reduction and normalizing are implemented by the blockwise bi-directional two-dimensional principal component analysis for palmprint images to extract feature matrixes, which are assembled into an overcomplete dictionary in sparse classification. A subspace orthogonal matching pursuit algorithm is designed to solve the grouping sparse representation. Finally, the classification result is gained by comparing the residual between testing and reconstructed images. Experiments are carried out on a palmprint database, and the results show that this method has better robustness against position and illumination changes of palmprint images, and can get higher rate of palmprint recognition.

  20. Development of Innovative Technology to Expand Precipitation Observations in Satellite Precipitation Validation in Under-developed Data-sparse Regions

    NASA Astrophysics Data System (ADS)

    Kucera, P. A.; Steinson, M.

    2016-12-01

    Accurate and reliable real-time monitoring and dissemination of observations of precipitation and surface weather conditions in general is critical for a variety of research studies and applications. Surface precipitation observations provide important reference information for evaluating satellite (e.g., GPM) precipitation estimates. High quality surface observations of precipitation, temperature, moisture, and winds are important for applications such as agriculture, water resource monitoring, health, and hazardous weather early warning systems. In many regions of the World, surface weather station and precipitation gauge networks are sparsely located and/or of poor quality. Existing stations have often been sited incorrectly, not well-maintained, and have limited communications established at the site for real-time monitoring. The University Corporation for Atmospheric Research (UCAR)/National Center for Atmospheric Research (NCAR), with support from USAID, has started an initiative to develop and deploy low-cost weather instrumentation including tipping bucket and weighing-type precipitation gauges in sparsely observed regions of the world. The goal is to improve the number of observations (temporally and spatially) for the evaluation of satellite precipitation estimates in data-sparse regions and to improve the quality of applications for environmental monitoring and early warning alert systems on a regional to global scale. One important aspect of this initiative is to make the data open to the community. The weather station instrumentation have been developed using innovative new technologies such as 3D printers, Raspberry Pi computing systems, and wireless communications. An initial pilot project have been implemented in the country of Zambia. This effort could be expanded to other data sparse regions around the globe. The presentation will provide an overview and demonstration of 3D printed weather station development and initial evaluation of observed precipitation datasets.

  1. Functional linear models for zero-inflated count data with application to modeling hospitalizations in patients on dialysis.

    PubMed

    Sentürk, Damla; Dalrymple, Lorien S; Nguyen, Danh V

    2014-11-30

    We propose functional linear models for zero-inflated count data with a focus on the functional hurdle and functional zero-inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero-truncated Poisson distribution, the ZIP model considers a mixture of a degenerate distribution at zero and a standard Poisson distribution. We extend the generalized functional linear model framework with a functional predictor and multiple cross-sectional predictors to model counts generated by a mixture distribution. We propose an estimation procedure for functional hurdle and ZIP models, called penalized reconstruction, geared towards error-prone and sparsely observed longitudinal functional predictors. The approach relies on dimension reduction and pooling of information across subjects involving basis expansions and penalized maximum likelihood techniques. The developed functional hurdle model is applied to modeling hospitalizations within the first 2 years from initiation of dialysis, with a high percentage of zeros, in the Comprehensive Dialysis Study participants. Hospitalization counts are modeled as a function of sparse longitudinal measurements of serum albumin concentrations, patient demographics, and comorbidities. Simulation studies are used to study finite sample properties of the proposed method and include comparisons with an adaptation of standard principal components regression. Copyright © 2014 John Wiley & Sons, Ltd.

  2. Bayesian sparse channel estimation

    NASA Astrophysics Data System (ADS)

    Chen, Chulong; Zoltowski, Michael D.

    2012-05-01

    In Orthogonal Frequency Division Multiplexing (OFDM) systems, the technique used to estimate and track the time-varying multipath channel is critical to ensure reliable, high data rate communications. It is recognized that wireless channels often exhibit a sparse structure, especially for wideband and ultra-wideband systems. In order to exploit this sparse structure to reduce the number of pilot tones and increase the channel estimation quality, the application of compressed sensing to channel estimation is proposed. In this article, to make the compressed channel estimation more feasible for practical applications, it is investigated from a perspective of Bayesian learning. Under the Bayesian learning framework, the large-scale compressed sensing problem, as well as large time delay for the estimation of the doubly selective channel over multiple consecutive OFDM symbols, can be avoided. Simulation studies show a significant improvement in channel estimation MSE and less computing time compared to the conventional compressed channel estimation techniques.

  3. Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation

    PubMed Central

    Zhao, Tuo; Liu, Han

    2016-01-01

    We propose an accelerated path-following iterative shrinkage thresholding algorithm (APISTA) for solving high dimensional sparse nonconvex learning problems. The main difference between APISTA and the path-following iterative shrinkage thresholding algorithm (PISTA) is that APISTA exploits an additional coordinate descent subroutine to boost the computational performance. Such a modification, though simple, has profound impact: APISTA not only enjoys the same theoretical guarantee as that of PISTA, i.e., APISTA attains a linear rate of convergence to a unique sparse local optimum with good statistical properties, but also significantly outperforms PISTA in empirical benchmarks. As an application, we apply APISTA to solve a family of nonconvex optimization problems motivated by estimating sparse semiparametric graphical models. APISTA allows us to obtain new statistical recovery results which do not exist in the existing literature. Thorough numerical results are provided to back up our theory. PMID:28133430

  4. Analysing baryon acoustic oscillations in sparse spectroscopic samples via cross-correlation with dense photometry

    NASA Astrophysics Data System (ADS)

    Patej, A.; Eisenstein, D. J.

    2018-07-01

    We develop a formalism for measuring the cosmological distance scale from baryon acoustic oscillations (BAO) using the cross-correlation of a sparse redshift survey with a denser photometric sample. This reduces the shot noise that would otherwise affect the autocorrelation of the sparse spectroscopic map. As a proof of principle, we make the first on-sky application of this method to a sparse sample defined as the z > 0.6 tail of the Sloan Digital Sky Survey's (SDSS) BOSS/CMASS sample of galaxies and a dense photometric sample from SDSS DR9. We find a 2.8σ preference for the BAO peak in the cross-correlation at an effective z = 0.64, from which we measure the angular diameter distance DM(z = 0.64) = (2418 ± 73 Mpc)(rs/rs, fid). Accordingly, we expect that using this method to combine sparse spectroscopy with the deep, high-quality imaging that is just now becoming available will enable higher precision BAO measurements than possible with the spectroscopy alone.

  5. Analyzing Baryon Acoustic Oscillations in Sparse Spectroscopic Samples via Cross-Correlation with Dense Photometry

    NASA Astrophysics Data System (ADS)

    Patej, Anna; Eisenstein, Daniel J.

    2018-04-01

    We develop a formalism for measuring the cosmological distance scale from baryon acoustic oscillations (BAO) using the cross-correlation of a sparse redshift survey with a denser photometric sample. This reduces the shot noise that would otherwise affect the auto-correlation of the sparse spectroscopic map. As a proof of principle, we make the first on-sky application of this method to a sparse sample defined as the z > 0.6 tail of the Sloan Digital Sky Survey's (SDSS) BOSS/CMASS sample of galaxies and a dense photometric sample from SDSS DR9. We find a 2.8σ preference for the BAO peak in the cross-correlation at an effective z = 0.64, from which we measure the angular diameter distance DM(z = 0.64) = (2418 ± 73 Mpc)(rs/rs, fid). Accordingly, we expect that using this method to combine sparse spectroscopy with the deep, high quality imaging that is just now becoming available will enable higher precision BAO measurements than possible with the spectroscopy alone.

  6. Application of validation data for assessing spatial interpolation methods for 8-h ozone or other sparsely monitored constituents.

    PubMed

    Joseph, John; Sharif, Hatim O; Sunil, Thankam; Alamgir, Hasanat

    2013-07-01

    The adverse health effects of high concentrations of ground-level ozone are well-known, but estimating exposure is difficult due to the sparseness of urban monitoring networks. This sparseness discourages the reservation of a portion of the monitoring stations for validation of interpolation techniques precisely when the risk of overfitting is greatest. In this study, we test a variety of simple spatial interpolation techniques for 8-h ozone with thousands of randomly selected subsets of data from two urban areas with monitoring stations sufficiently numerous to allow for true validation. Results indicate that ordinary kriging with only the range parameter calibrated in an exponential variogram is the generally superior method, and yields reliable confidence intervals. Sparse data sets may contain sufficient information for calibration of the range parameter even if the Moran I p-value is close to unity. R script is made available to apply the methodology to other sparsely monitored constituents. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Multilevel sparse functional principal component analysis.

    PubMed

    Di, Chongzhi; Crainiceanu, Ciprian M; Jank, Wolfgang S

    2014-01-29

    We consider analysis of sparsely sampled multilevel functional data, where the basic observational unit is a function and data have a natural hierarchy of basic units. An example is when functions are recorded at multiple visits for each subject. Multilevel functional principal component analysis (MFPCA; Di et al. 2009) was proposed for such data when functions are densely recorded. Here we consider the case when functions are sparsely sampled and may contain only a few observations per function. We exploit the multilevel structure of covariance operators and achieve data reduction by principal component decompositions at both between and within subject levels. We address inherent methodological differences in the sparse sampling context to: 1) estimate the covariance operators; 2) estimate the functional principal component scores; 3) predict the underlying curves. Through simulations the proposed method is able to discover dominating modes of variations and reconstruct underlying curves well even in sparse settings. Our approach is illustrated by two applications, the Sleep Heart Health Study and eBay auctions.

  8. Classification of mislabelled microarrays using robust sparse logistic regression.

    PubMed

    Bootkrajang, Jakramate; Kabán, Ata

    2013-04-01

    Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.

  9. An efficient dictionary learning algorithm and its application to 3-D medical image denoising.

    PubMed

    Li, Shutao; Fang, Leyuan; Yin, Haitao

    2012-02-01

    In this paper, we propose an efficient dictionary learning algorithm for sparse representation of given data and suggest a way to apply this algorithm to 3-D medical image denoising. Our learning approach is composed of two main parts: sparse coding and dictionary updating. On the sparse coding stage, an efficient algorithm named multiple clusters pursuit (MCP) is proposed. The MCP first applies a dictionary structuring strategy to cluster the atoms with high coherence together, and then employs a multiple-selection strategy to select several competitive atoms at each iteration. These two strategies can greatly reduce the computation complexity of the MCP and assist it to obtain better sparse solution. On the dictionary updating stage, the alternating optimization that efficiently approximates the singular value decomposition is introduced. Furthermore, in the 3-D medical image denoising application, a joint 3-D operation is proposed for taking the learning capabilities of the presented algorithm to simultaneously capture the correlations within each slice and correlations across the nearby slices, thereby obtaining better denoising results. The experiments on both synthetically generated data and real 3-D medical images demonstrate that the proposed approach has superior performance compared to some well-known methods. © 2011 IEEE

  10. Forest/non-forest mapping using inventory data and satellite imagery

    Treesearch

    Ronald E. McRoberts

    2002-01-01

    For two study areas in Minnesota, USA, one heavily forested and one sparsely forested, maps of predicted proportion forest area were created using Landsat Thematic Mapper imagery, forest inventory plot data, and two prediction techniques, logistic regression and a k-Nearest Neighbours technique. The maps were used to increase the precision of forest area estimates by...

  11. The effects of forest fragmentation on forest stand attributes

    Treesearch

    Ronald E. McRoberts; Greg C. Liknes

    2002-01-01

    For two study areas in Minnesota, USA, one heavily forested and one sparsely forested, maps of predicted proportion forest area were created using Landsat Thematic Mapper imagery, forest inventory plot data, and a logistic regression model. The maps were used to estimate quantitative indices of forest fragmentation. Correlations between the values of the indices and...

  12. Sparse distributed memory overview

    NASA Technical Reports Server (NTRS)

    Raugh, Mike

    1990-01-01

    The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.

  13. X-ray computed tomography using curvelet sparse regularization.

    PubMed

    Wieczorek, Matthias; Frikel, Jürgen; Vogel, Jakob; Eggl, Elena; Kopp, Felix; Noël, Peter B; Pfeiffer, Franz; Demaret, Laurent; Lasser, Tobias

    2015-04-01

    Reconstruction of x-ray computed tomography (CT) data remains a mathematically challenging problem in medical imaging. Complementing the standard analytical reconstruction methods, sparse regularization is growing in importance, as it allows inclusion of prior knowledge. The paper presents a method for sparse regularization based on the curvelet frame for the application to iterative reconstruction in x-ray computed tomography. In this work, the authors present an iterative reconstruction approach based on the alternating direction method of multipliers using curvelet sparse regularization. Evaluation of the method is performed on a specifically crafted numerical phantom dataset to highlight the method's strengths. Additional evaluation is performed on two real datasets from commercial scanners with different noise characteristics, a clinical bone sample acquired in a micro-CT and a human abdomen scanned in a diagnostic CT. The results clearly illustrate that curvelet sparse regularization has characteristic strengths. In particular, it improves the restoration and resolution of highly directional, high contrast features with smooth contrast variations. The authors also compare this approach to the popular technique of total variation and to traditional filtered backprojection. The authors conclude that curvelet sparse regularization is able to improve reconstruction quality by reducing noise while preserving highly directional features.

  14. View-interpolation of sparsely sampled sinogram using convolutional neural network

    NASA Astrophysics Data System (ADS)

    Lee, Hoyeon; Lee, Jongha; Cho, Suengryong

    2017-02-01

    Spare-view sampling and its associated iterative image reconstruction in computed tomography have actively investigated. Sparse-view CT technique is a viable option to low-dose CT, particularly in cone-beam CT (CBCT) applications, with advanced iterative image reconstructions with varying degrees of image artifacts. One of the artifacts that may occur in sparse-view CT is the streak artifact in the reconstructed images. Another approach has been investigated for sparse-view CT imaging by use of the interpolation methods to fill in the missing view data and that reconstructs the image by an analytic reconstruction algorithm. In this study, we developed an interpolation method using convolutional neural network (CNN), which is one of the widely used deep-learning methods, to find missing projection data and compared its performances with the other interpolation techniques.

  15. Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

    NASA Astrophysics Data System (ADS)

    Galiatsatos, P. G.; Tennyson, J.

    2012-11-01

    The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.

  16. A General Sparse Tensor Framework for Electronic Structure Theory

    DOE PAGES

    Manzer, Samuel; Epifanovsky, Evgeny; Krylov, Anna I.; ...

    2017-01-24

    Linear-scaling algorithms must be developed in order to extend the domain of applicability of electronic structure theory to molecules of any desired size. But, the increasing complexity of modern linear-scaling methods makes code development and maintenance a significant challenge. A major contributor to this difficulty is the lack of robust software abstractions for handling block-sparse tensor operations. We therefore report the development of a highly efficient symbolic block-sparse tensor library in order to provide access to high-level software constructs to treat such problems. Our implementation supports arbitrary multi-dimensional sparsity in all input and output tensors. We then avoid cumbersome machine-generatedmore » code by implementing all functionality as a high-level symbolic C++ language library and demonstrate that our implementation attains very high performance for linear-scaling sparse tensor contractions.« less

  17. Statistical aspects of genetic association testing in small samples, based on selective DNA pooling data in the arctic fox.

    PubMed

    Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna

    2008-01-01

    We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.

  18. C%2B%2B tensor toolbox user manual.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Plantenga, Todd D.; Kolda, Tamara Gibson

    2012-04-01

    The C++ Tensor Toolbox is a software package for computing tensor decompositions. It is based on the Matlab Tensor Toolbox, and is particularly optimized for sparse data sets. This user manual briefly overviews tensor decomposition mathematics, software capabilities, and installation of the package. Tensors (also known as multidimensional arrays or N-way arrays) are used in a variety of applications ranging from chemometrics to network analysis. The Tensor Toolbox provides classes for manipulating dense, sparse, and structured tensors in C++. The Toolbox compiles into libraries and is intended for use with custom applications written by users.

  19. Sparse image reconstruction for molecular imaging.

    PubMed

    Ting, Michael; Raich, Raviv; Hero, Alfred O

    2009-06-01

    The application that motivates this paper is molecular imaging at the atomic level. When discretized at subatomic distances, the volume is inherently sparse. Noiseless measurements from an imaging technology can be modeled by convolution of the image with the system point spread function (psf). Such is the case with magnetic resonance force microscopy (MRFM), an emerging technology where imaging of an individual tobacco mosaic virus was recently demonstrated with nanometer resolution. We also consider additive white Gaussian noise (AWGN) in the measurements. Many prior works of sparse estimators have focused on the case when H has low coherence; however, the system matrix H in our application is the convolution matrix for the system psf. A typical convolution matrix has high coherence. This paper, therefore, does not assume a low coherence H. A discrete-continuous form of the Laplacian and atom at zero (LAZE) p.d.f. used by Johnstone and Silverman is formulated, and two sparse estimators derived by maximizing the joint p.d.f. of the observation and image conditioned on the hyperparameters. A thresholding rule that generalizes the hard and soft thresholding rule appears in the course of the derivation. This so-called hybrid thresholding rule, when used in the iterative thresholding framework, gives rise to the hybrid estimator, a generalization of the lasso. Estimates of the hyperparameters for the lasso and hybrid estimator are obtained via Stein's unbiased risk estimate (SURE). A numerical study with a Gaussian psf and two sparse images shows that the hybrid estimator outperforms the lasso.

  20. EPS-LASSO: Test for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous Traits.

    PubMed

    Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen

    2018-01-25

    Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. BIRD: A general interface for sparse distributed memory simulators

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1990-01-01

    Kanerva's sparse distributed memory (SDM) has now been implemented for at least six different computers, including SUN3 workstations, the Apple Macintosh, and the Connection Machine. A common interface for input of commands would both aid testing of programs on a broad range of computer architectures and assist users in transferring results from research environments to applications. A common interface also allows secondary programs to generate command sequences for a sparse distributed memory, which may then be executed on the appropriate hardware. The BIRD program is an attempt to create such an interface. Simplifying access to different simulators should assist developers in finding appropriate uses for SDM.

  2. A Study on the Potential Applications of Satellite Data in Air Quality Monitoring and Forecasting

    NASA Technical Reports Server (NTRS)

    Li, Can; Hsu, N. Christina; Tsay, Si-Chee

    2011-01-01

    In this study we explore the potential applications of MODIS (Moderate Resolution Imaging Spectroradiometer) -like satellite sensors in air quality research for some Asian regions. The MODIS aerosol optical thickness (AOT), NCEP global reanalysis meteorological data, and daily surface PM(sub 10) concentrations over China and Thailand from 2001 to 2009 were analyzed using simple and multiple regression models. The AOT-PM(sub 10) correlation demonstrates substantial seasonal and regional difference, likely reflecting variations in aerosol composition and atmospheric conditions, Meteorological factors, particularly relative humidity, were found to influence the AOT-PM(sub 10) relationship. Their inclusion in regression models leads to more accurate assessment of PM(sub 10) from space borne observations. We further introduced a simple method for employing the satellite data to empirically forecast surface particulate pollution, In general, AOT from the previous day (day 0) is used as a predicator variable, along with the forecasted meteorology for the following day (day 1), to predict the PM(sub 10) level for day 1. The contribution of regional transport is represented by backward trajectories combined with AOT. This method was evaluated through PM(sub 10) hindcasts for 2008-2009, using ohservations from 2005 to 2007 as a training data set to obtain model coefficients. For five big Chinese cities, over 50% of the hindcasts have percentage error less than or equal to 30%. Similar performance was achieved for cities in northern Thailand. The MODIS AOT data are responsible for at least part of the demonstrated forecasting skill. This method can be easily adapted for other regions, but is probably most useful for those having sparse ground monitoring networks or no access to sophisticated deterministic models. We also highlight several existing issues, including some inherent to a regression-based approach as exemplified by a case study for Beijing, Further studies will be necessa1Y before satellite data can see more extensive applications in the operational air quality monitoring and forecasting.

  3. Representation-Independent Iteration of Sparse Data Arrays

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    An approach is defined that describes a method of iterating over massively large arrays containing sparse data using an approach that is implementation independent of how the contents of the sparse arrays are laid out in memory. What is unique and important here is the decoupling of the iteration over the sparse set of array elements from how they are internally represented in memory. This enables this approach to be backward compatible with existing schemes for representing sparse arrays as well as new approaches. What is novel here is a new approach for efficiently iterating over sparse arrays that is independent of the underlying memory layout representation of the array. A functional interface is defined for implementing sparse arrays in any modern programming language with a particular focus for the Chapel programming language. Examples are provided that show the translation of a loop that computes a matrix vector product into this representation for both the distributed and not-distributed cases. This work is directly applicable to NASA and its High Productivity Computing Systems (HPCS) program that JPL and our current program are engaged in. The goal of this program is to create powerful, scalable, and economically viable high-powered computer systems suitable for use in national security and industry by 2010. This is important to NASA for its computationally intensive requirements for analyzing and understanding the volumes of science data from our returned missions.

  4. JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.

    PubMed

    Labschütz, Matthias; Bruckner, Stefan; Gröller, M Eduard; Hadwiger, Markus; Rautek, Peter

    2016-01-01

    Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.

  5. Influence plots for LASSO

    DOE PAGES

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    2016-11-22

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  6. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  7. Influence plots for LASSO

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  8. Improved FastICA algorithm in fMRI data analysis using the sparsity property of the sources.

    PubMed

    Ge, Ruiyang; Wang, Yubao; Zhang, Jipeng; Yao, Li; Zhang, Hang; Long, Zhiying

    2016-04-01

    As a blind source separation technique, independent component analysis (ICA) has many applications in functional magnetic resonance imaging (fMRI). Although either temporal or spatial prior information has been introduced into the constrained ICA and semi-blind ICA methods to improve the performance of ICA in fMRI data analysis, certain types of additional prior information, such as the sparsity, has seldom been added to the ICA algorithms as constraints. In this study, we proposed a SparseFastICA method by adding the source sparsity as a constraint to the FastICA algorithm to improve the performance of the widely used FastICA. The source sparsity is estimated through a smoothed ℓ0 norm method. We performed experimental tests on both simulated data and real fMRI data to investigate the feasibility and robustness of SparseFastICA and made a performance comparison between SparseFastICA, FastICA and Infomax ICA. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of SparseFastICA for the source separation in fMRI data. Both the simulated and real fMRI experimental results showed that SparseFastICA has better robustness to noise and better spatial detection power than FastICA. Although the spatial detection power of SparseFastICA and Infomax did not show significant difference, SparseFastICA had faster computation speed than Infomax. SparseFastICA was comparable to the Infomax algorithm with a faster computation speed. More importantly, SparseFastICA outperformed FastICA in robustness and spatial detection power and can be used to identify more accurate brain networks than FastICA algorithm. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Accelerating Convolutional Sparse Coding for Curvilinear Structures Segmentation by Refining SCIRD-TS Filter Banks.

    PubMed

    Annunziata, Roberto; Trucco, Emanuele

    2016-11-01

    Deep learning has shown great potential for curvilinear structure (e.g., retinal blood vessels and neurites) segmentation as demonstrated by a recent auto-context regression architecture based on filter banks learned by convolutional sparse coding. However, learning such filter banks is very time-consuming, thus limiting the amount of filters employed and the adaptation to other data sets (i.e., slow re-training). We address this limitation by proposing a novel acceleration strategy to speed-up convolutional sparse coding filter learning for curvilinear structure segmentation. Our approach is based on a novel initialisation strategy (warm start), and therefore it is different from recent methods improving the optimisation itself. Our warm-start strategy is based on carefully designed hand-crafted filters (SCIRD-TS), modelling appearance properties of curvilinear structures which are then refined by convolutional sparse coding. Experiments on four diverse data sets, including retinal blood vessels and neurites, suggest that the proposed method reduces significantly the time taken to learn convolutional filter banks (i.e., up to -82%) compared to conventional initialisation strategies. Remarkably, this speed-up does not worsen performance; in fact, filters learned with the proposed strategy often achieve a much lower reconstruction error and match or exceed the segmentation performance of random and DCT-based initialisation, when used as input to a random forest classifier.

  10. Sparse and stable Markowitz portfolios.

    PubMed

    Brodie, Joshua; Daubechies, Ingrid; De Mol, Christine; Giannone, Domenico; Loris, Ignace

    2009-07-28

    We consider the problem of portfolio selection within the classical Markowitz mean-variance framework, reformulated as a constrained least-squares regression problem. We propose to add to the objective function a penalty proportional to the sum of the absolute values of the portfolio weights. This penalty regularizes (stabilizes) the optimization problem, encourages sparse portfolios (i.e., portfolios with only few active positions), and allows accounting for transaction costs. Our approach recovers as special cases the no-short-positions portfolios, but does allow for short positions in limited number. We implement this methodology on two benchmark data sets constructed by Fama and French. Using only a modest amount of training data, we construct portfolios whose out-of-sample performance, as measured by Sharpe ratio, is consistently and significantly better than that of the naïve evenly weighted portfolio.

  11. Selection of polynomial chaos bases via Bayesian model uncertainty methods with applications to sparse approximation of PDEs with stochastic inputs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karagiannis, Georgios, E-mail: georgios.karagiannis@pnnl.gov; Lin, Guang, E-mail: guang.lin@pnnl.gov

    2014-02-15

    Generalized polynomial chaos (gPC) expansions allow us to represent the solution of a stochastic system using a series of polynomial chaos basis functions. The number of gPC terms increases dramatically as the dimension of the random input variables increases. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs when the corresponding deterministic solver is computationally expensive, evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solutions, in both spatial and random domains, bymore » coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spatial points, via (1) the Bayesian model average (BMA) or (2) the median probability model, and their construction as spatial functions on the spatial domain via spline interpolation. The former accounts for the model uncertainty and provides Bayes-optimal predictions; while the latter provides a sparse representation of the stochastic solutions by evaluating the expansion on a subset of dominating gPC bases. Moreover, the proposed methods quantify the importance of the gPC bases in the probabilistic sense through inclusion probabilities. We design a Markov chain Monte Carlo (MCMC) sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed methods are suitable for, but not restricted to, problems whose stochastic solutions are sparse in the stochastic space with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the accuracy and performance of the proposed methods and make comparisons with other approaches on solving elliptic SPDEs with 1-, 14- and 40-random dimensions.« less

  12. Kurtosis: A Forgotten Moment

    ERIC Educational Resources Information Center

    McAlevey, Lynn G.; Stent, Alan F.

    2018-01-01

    The treatment of kurtosis in textbooks is both sparse and contradictory with applications rarely discussed. To address this, an easily understood definition of kurtosis is introduced and important applications are demonstrated. Two different approaches to teaching kurtosis are presented based on a financial application.

  13. Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less

  14. Locating multiple diffusion sources in time varying networks from sparse observations.

    PubMed

    Hu, Zhao-Long; Shen, Zhesi; Cao, Shinan; Podobnik, Boris; Yang, Huijie; Wang, Wen-Xu; Lai, Ying-Cheng

    2018-02-08

    Data based source localization in complex networks has a broad range of applications. Despite recent progress, locating multiple diffusion sources in time varying networks remains to be an outstanding problem. Bridging structural observability and sparse signal reconstruction theories, we develop a general framework to locate diffusion sources in time varying networks based solely on sparse data from a small set of messenger nodes. A general finding is that large degree nodes produce more valuable information than small degree nodes, a result that contrasts that for static networks. Choosing large degree nodes as the messengers, we find that sparse observations from a few such nodes are often sufficient for any number of diffusion sources to be located for a variety of model and empirical networks. Counterintuitively, sources in more rapidly varying networks can be identified more readily with fewer required messenger nodes.

  15. Embedded sparse representation of fMRI data via group-wise dictionary optimization

    NASA Astrophysics Data System (ADS)

    Zhu, Dajiang; Lin, Binbin; Faskowitz, Joshua; Ye, Jieping; Thompson, Paul M.

    2016-03-01

    Sparse learning enables dimension reduction and efficient modeling of high dimensional signals and images, but it may need to be tailored to best suit specific applications and datasets. Here we used sparse learning to efficiently represent functional magnetic resonance imaging (fMRI) data from the human brain. We propose a novel embedded sparse representation (ESR), to identify the most consistent dictionary atoms across different brain datasets via an iterative group-wise dictionary optimization procedure. In this framework, we introduced additional criteria to make the learned dictionary atoms more consistent across different subjects. We successfully identified four common dictionary atoms that follow the external task stimuli with very high accuracy. After projecting the corresponding coefficient vectors back into the 3-D brain volume space, the spatial patterns are also consistent with traditional fMRI analysis results. Our framework reveals common features of brain activation in a population, as a new, efficient fMRI analysis method.

  16. Smooth Approximation l 0-Norm Constrained Affine Projection Algorithm and Its Applications in Sparse Channel Estimation

    PubMed Central

    2014-01-01

    We propose a smooth approximation l 0-norm constrained affine projection algorithm (SL0-APA) to improve the convergence speed and the steady-state error of affine projection algorithm (APA) for sparse channel estimation. The proposed algorithm ensures improved performance in terms of the convergence speed and the steady-state error via the combination of a smooth approximation l 0-norm (SL0) penalty on the coefficients into the standard APA cost function, which gives rise to a zero attractor that promotes the sparsity of the channel taps in the channel estimation and hence accelerates the convergence speed and reduces the steady-state error when the channel is sparse. The simulation results demonstrate that our proposed SL0-APA is superior to the standard APA and its sparsity-aware algorithms in terms of both the convergence speed and the steady-state behavior in a designated sparse channel. Furthermore, SL0-APA is shown to have smaller steady-state error than the previously proposed sparsity-aware algorithms when the number of nonzero taps in the sparse channel increases. PMID:24790588

  17. Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.

    PubMed

    Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack

    2017-06-01

    In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.

  18. Multi-View Multi-Instance Learning Based on Joint Sparse Representation and Multi-View Dictionary Learning.

    PubMed

    Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve

    2017-12-01

    In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.

  19. Robust representation and recognition of facial emotions using extreme sparse learning.

    PubMed

    Shojaeilangari, Seyedehsamaneh; Yau, Wei-Yun; Nandakumar, Karthik; Li, Jun; Teoh, Eam Khwang

    2015-07-01

    Recognition of natural emotions from human faces is an interesting topic with a wide range of potential applications, such as human-computer interaction, automated tutoring systems, image and video retrieval, smart environments, and driver warning systems. Traditionally, facial emotion recognition systems have been evaluated on laboratory controlled data, which is not representative of the environment faced in real-world applications. To robustly recognize the facial emotions in real-world natural situations, this paper proposes an approach called extreme sparse learning, which has the ability to jointly learn a dictionary (set of basis) and a nonlinear classification model. The proposed approach combines the discriminative power of extreme learning machine with the reconstruction property of sparse representation to enable accurate classification when presented with noisy signals and imperfect data recorded in natural settings. In addition, this paper presents a new local spatio-temporal descriptor that is distinctive and pose-invariant. The proposed framework is able to achieve the state-of-the-art recognition accuracy on both acted and spontaneous facial emotion databases.

  20. Online sparse Gaussian process based human motion intent learning for an electrically actuated lower extremity exoskeleton.

    PubMed

    Long, Yi; Du, Zhi-Jiang; Chen, Chao-Feng; Dong, Wei; Wang, Wei-Dong

    2017-07-01

    The most important step for lower extremity exoskeleton is to infer human motion intent (HMI), which contributes to achieve human exoskeleton collaboration. Since the user is in the control loop, the relationship between human robot interaction (HRI) information and HMI is nonlinear and complicated, which is difficult to be modeled by using mathematical approaches. The nonlinear approximation can be learned by using machine learning approaches. Gaussian Process (GP) regression is suitable for high-dimensional and small-sample nonlinear regression problems. GP regression is restrictive for large data sets due to its computation complexity. In this paper, an online sparse GP algorithm is constructed to learn the HMI. The original training dataset is collected when the user wears the exoskeleton system with friction compensation to perform unconstrained movement as far as possible. The dataset has two kinds of data, i.e., (1) physical HRI, which is collected by torque sensors placed at the interaction cuffs for the active joints, i.e., knee joints; (2) joint angular position, which is measured by optical position sensors. To reduce the computation complexity of GP, grey relational analysis (GRA) is utilized to specify the original dataset and provide the final training dataset. Those hyper-parameters are optimized offline by maximizing marginal likelihood and will be applied into online GP regression algorithm. The HMI, i.e., angular position of human joints, will be regarded as the reference trajectory for the mechanical legs. To verify the effectiveness of the proposed algorithm, experiments are performed on a subject at a natural speed. The experimental results show the HMI can be obtained in real time, which can be extended and employed in the similar exoskeleton systems.

  1. Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse over-complete features

    NASA Astrophysics Data System (ADS)

    Ahmed, H. O. A.; Wong, M. L. D.; Nandi, A. K.

    2018-01-01

    Condition classification of rolling element bearings in rotating machines is important to prevent the breakdown of industrial machinery. A considerable amount of literature has been published on bearing faults classification. These studies aim to determine automatically the current status of a roller element bearing. Of these studies, methods based on compressed sensing (CS) have received some attention recently due to their ability to allow one to sample below the Nyquist sampling rate. This technology has many possible uses in machine condition monitoring and has been investigated as a possible approach for fault detection and classification in the compressed domain, i.e., without reconstructing the original signal. However, previous CS based methods have been found to be too weak for highly compressed data. The present paper explores computationally, for the first time, the effects of sparse autoencoder based over-complete sparse representations on the classification performance of highly compressed measurements of bearing vibration signals. For this study, the CS method was used to produce highly compressed measurements of the original bearing dataset. Then, an effective deep neural network (DNN) with unsupervised feature learning algorithm based on sparse autoencoder is used for learning over-complete sparse representations of these compressed datasets. Finally, the fault classification is achieved using two stages, namely, pre-training classification based on stacked autoencoder and softmax regression layer form the deep net stage (the first stage), and re-training classification based on backpropagation (BP) algorithm forms the fine-tuning stage (the second stage). The experimental results show that the proposed method is able to achieve high levels of accuracy even with extremely compressed measurements compared with the existing techniques.

  2. Statistical inference methods for sparse biological time series data.

    PubMed

    Ndukum, Juliet; Fonseca, Luís L; Santos, Helena; Voit, Eberhard O; Datta, Susmita

    2011-04-25

    Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR) spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values <0.0001). We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures, based on ANOVA likelihood ratio tests, for testing the significance of differences between short time course data under different biological perturbations.

  3. Inverse problems with nonnegative and sparse solutions: algorithms and application to the phase retrieval problem

    NASA Astrophysics Data System (ADS)

    Quy Muoi, Pham; Nho Hào, Dinh; Sahoo, Sujit Kumar; Tang, Dongliang; Cong, Nguyen Huu; Dang, Cuong

    2018-05-01

    In this paper, we study a gradient-type method and a semismooth Newton method for minimization problems in regularizing inverse problems with nonnegative and sparse solutions. We propose a special penalty functional forcing the minimizers of regularized minimization problems to be nonnegative and sparse, and then we apply the proposed algorithms in a practical the problem. The strong convergence of the gradient-type method and the local superlinear convergence of the semismooth Newton method are proven. Then, we use these algorithms for the phase retrieval problem and illustrate their efficiency in numerical examples, particularly in the practical problem of optical imaging through scattering media where all the noises from experiment are presented.

  4. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Wenyang; Cheung, Yam; Sawant, Amit

    2016-05-15

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparsemore » regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.« less

  5. Association of childhood abuse with homeless women's social networks.

    PubMed

    Green, Harold D; Tucker, Joan S; Wenzel, Suzanne L; Golinelli, Daniela; Kennedy, David P; Ryan, Gery W; Zhou, Annie J

    2012-01-01

    Childhood abuse has been linked to negative sequelae for women later in life including drug and alcohol use and violence as victim or perpetrator and may also affect the development of women's social networks. Childhood abuse is prevalent among at-risk populations of women (such as the homeless) and thus may have a stronger impact on their social networks. We conducted a study to: (a) develop a typology of sheltered homeless women's social networks; (b) determine whether childhood abuse was associated with the social networks of sheltered homeless women; and (c) determine whether those associations remained after accounting for past-year substance abuse and recent intimate partner abuse. A probability sample of 428 homeless women from temporary shelter settings in Los Angeles County completed a personal network survey that provided respondent information as well as information about their network members' demographics and level of interaction with each other. Cluster analyses identified groups of women who shared specific social network characteristics. Multinomial logistic regressions revealed variables associated with group membership. We identified three groups of women with differing social network characteristics: low-risk networks, densely connected risky networks (dense, risky), and sparsely connected risky networks (sparse, risky). Multinomial logistic regressions indicated that membership in the sparse, risky network group, when compared to the low-risk group, was associated with history of childhood physical abuse (but not sexual or emotional abuse). Recent drug abuse was associated with membership in both risky network groups; however, the association of childhood physical abuse with sparse, risky network group membership remained. Although these findings support theories proposing that the experience of childhood abuse can shape women's social networks, they suggest that it may be childhood physical abuse that has the most impact among homeless women. The effects of childhood physical abuse should be more actively investigated in clinical settings, especially those frequented by homeless women, particularly with respect to the formation of social networks in social contexts that may expose these women to greater risks. Copyright © 2012. Published by Elsevier Ltd.

  6. Application of Sparse Linear Discriminant Analysis and Elastic Net for Diagnosis of IgA Nephropathy: Statistical and Biological Viewpoints

    PubMed

    Mohammadi Majd, Tahereh; Kalantari, Shiva; Raeisi Shahraki, Hadi; Nafar, Mohsen; Almasi, Afshin; Samavat, Shiva; Parvin, Mahmoud; Hashemian, Amirhossein

    2018-03-10

    IgA nephropathy (IgAN) is the most common primary glomerulonephritis diagnosed based on renal biopsy. Mesangial IgA deposits along with the proliferation of mesangial cells are the histologic hallmark of IgAN. Non-invasive diagnostic tools may help to prompt diagnosis and therapy. The discovery of potential and reliable urinary biomarkers for diagnosis of IgAN depends on applying robust and suitable models. Applying two multivariate modeling methods on a urine proteomic dataset obtained from IgAN patients, and comparison of the results of these methods were the purpose of this study. Two models were constructed for urinary protein profiles of 13 patients and 8 healthy individuals, based on sparse linear discriminant analysis (SLDA) and elastic net regression methods. A panel of selected biomarkers with the best coefficients were proposed and further analyzed for biological relevance using functional annotation and pathway analysis. Transferrin, α1-antitrypsin, and albumin fragments were the most important up-regulated biomarkers, while fibulin-5, YIP1 family member 3, prasoposin, and osteopontin were the most important down-regulated biomarkers. Pathway analysis revealed that complement and coagulation cascades and extracellular matrix-receptor interaction pathways impaired in the pathogenesis of IgAN. SLDA and elastic net had an equal importance for diagnosis of IgAN and were useful methods for exploring and processing proteomic data. In addition, the suggested biomarkers are reliable candidates for further validation to non-invasive diagnose of IgAN based on urine examination.

  7. Category-Specific Comparison of Univariate Alerting Methods for Biosurveillance Decision Support

    PubMed Central

    Elbert, Yevgeniy; Hung, Vivian; Burkom, Howard

    2013-01-01

    Objective For a multi-source decision support application, we sought to match univariate alerting algorithms to surveillance data types to optimize detection performance. Introduction Temporal alerting algorithms commonly used in syndromic surveillance systems are often adjusted for data features such as cyclic behavior but are subject to overfitting or misspecification errors when applied indiscriminately. In a project for the Armed Forces Health Surveillance Center to enable multivariate decision support, we obtained 4.5 years of out-patient, prescription and laboratory test records from all US military treatment facilities. A proof-of-concept project phase produced 16 events with multiple evidence corroboration for comparison of alerting algorithms for detection performance. We used the representative streams from each data source to compare sensitivity of 6 algorithms to injected spikes, and we used all data streams from 16 known events to compare them for detection timeliness. Methods The six methods compared were: Holt-Winters generalized exponential smoothing method (1)automated choice between daily methods, regression and an exponential weighted moving average (2)adaptive daily Shewhart-type chartadaptive one-sided daily CUSUMEWMA applied to 7-day means with a trend correction; and7-day temporal scan statistic Sensitivity testing: We conducted comparative sensitivity testing for categories of time series with similar scales and seasonal behavior. We added multiples of the standard deviation of each time series as single-day injects in separate algorithm runs. For each candidate method, we then used as a sensitivity measure the proportion of these runs for which the output of each algorithm was below alerting thresholds estimated empirically for each algorithm using simulated data streams. We identified the algorithm(s) whose sensitivity was most consistently high for each data category. For each syndromic query applied to each data source (outpatient, lab test orders, and prescriptions), 502 authentic time series were derived, one for each reporting treatment facility. Data categories were selected in order to group time series with similar expected algorithm performance: Median > 100 < Median ≤ 10Median = 0Lag 7 Autocorrelation Coefficient ≥ 0.2Lag 7 Autocorrelation Coefficient < 0.2 Timeliness testing: For the timeliness testing, we avoided artificiality of simulated signals by measuring alerting detection delays in the 16 corroborated outbreaks. The multiple time series from these events gave a total of 141 time series with outbreak intervals for timeliness testing. The following measures were computed to quantify timeliness of detection: Median Detection Delay – median number of days to detect the outbreak.Penalized Mean Detection Delay –mean number of days to detect the outbreak with outbreak misses penalized as 1 day plus the maximum detection time. Results Based on the injection results, the Holt-Winters algorithm was most sensitive among time series with positive medians. The adaptive CUSUM and the Shewhart methods were most sensitive for data streams with median zero. Table 1 provides timeliness results using the 141 outbreak-associated streams on sparse (Median=0) and non-sparse data categories. [Insert table #1 here] Data median Detection Delay, days Holt-winters Regression EWMA Adaptive Shewhart Adaptive CUSUM 7-day Trend-adj. EWMA 7-day Temporal Scan Median 0 Median 3 2 4 2 4.5 2 Penalized Mean 7.2 7 6.6 6.2 7.3 7.6 Median >0 Median 2 2 2.5 2 6 4 Penalized Mean 6.1 7 7.2 7.1 7.7 6.6 The gray shading in the table 1 indicates methods with shortest detection delays for sparse and non-sparse data streams. The Holt-Winters method was again superior for non-sparse data. For data with median=0, the adaptive CUSUM was superior for a daily false alarm probability of 0.01, but the Shewhart method was timelier for more liberal thresholds. Conclusions Both kinds of detection performance analysis showed the method based on Holt-Winters exponential smoothing superior on non-sparse time series with day-of-week effects. The adaptive CUSUM and She-whart methods proved optimal on sparse data and data without weekly patterns.

  8. Sparse regularization for force identification using dictionaries

    NASA Astrophysics Data System (ADS)

    Qiao, Baijie; Zhang, Xingwu; Wang, Chenxi; Zhang, Hang; Chen, Xuefeng

    2016-04-01

    The classical function expansion method based on minimizing l2-norm of the response residual employs various basis functions to represent the unknown force. Its difficulty lies in determining the optimum number of basis functions. Considering the sparsity of force in the time domain or in other basis space, we develop a general sparse regularization method based on minimizing l1-norm of the coefficient vector of basis functions. The number of basis functions is adaptively determined by minimizing the number of nonzero components in the coefficient vector during the sparse regularization process. First, according to the profile of the unknown force, the dictionary composed of basis functions is determined. Second, a sparsity convex optimization model for force identification is constructed. Third, given the transfer function and the operational response, Sparse reconstruction by separable approximation (SpaRSA) is developed to solve the sparse regularization problem of force identification. Finally, experiments including identification of impact and harmonic forces are conducted on a cantilever thin plate structure to illustrate the effectiveness and applicability of SpaRSA. Besides the Dirac dictionary, other three sparse dictionaries including Db6 wavelets, Sym4 wavelets and cubic B-spline functions can also accurately identify both the single and double impact forces from highly noisy responses in a sparse representation frame. The discrete cosine functions can also successfully reconstruct the harmonic forces including the sinusoidal, square and triangular forces. Conversely, the traditional Tikhonov regularization method with the L-curve criterion fails to identify both the impact and harmonic forces in these cases.

  9. Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    B. Hendrickson; T.G. Kolda

    1998-09-01

    A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.

  10. Interaction Models for Functional Regression.

    PubMed

    Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab

    2016-02-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.

  11. Evaluation of fast highly undersampled contrast-enhanced MR angiography (sparse CE-MRA) in intracranial applications - initial study.

    PubMed

    Gratz, Marcel; Schlamann, Marc; Goericke, Sophia; Maderwald, Stefan; Quick, Harald H

    2017-03-01

    To assess the image quality of sparsely sampled contrast-enhanced MR angiography (sparse CE-MRA) providing high spatial resolution and whole-head coverage. Twenty-three patients scheduled for contrast-enhanced MR imaging of the head, (N = 19 with intracranial pathologies, N = 9 with vascular diseases), were included. Sparse CE-MRA at 3 Tesla was conducted using a single dose of contrast agent. Two neuroradiologists independently evaluated the data regarding vascular visibility and diagnostic value of overall 24 parameters and vascular segments on a 5-point ordinary scale (5 = very good, 1 = insufficient vascular visibility). Contrast bolus timing and the resulting arterio-venous overlap was also evaluated. Where available (N = 9), sparse CE-MRA was compared to intracranial Time-of-Flight MRA. The overall rating across all patients for sparse CE-MRA was 3.50 ± 1.07. Direct influence of the contrast bolus timing on the resulting image quality was observed. Overall mean vascular visibility and image quality across different features was rated good to intermediate (3.56 ± 0.95). The average performance of intracranial Time-of-Flight was rated 3.84 ± 0.87 across all patients and 3.54 ± 0.62 across all features. Sparse CE-MRA provides high-quality 3D MRA with high spatial resolution and whole-head coverage within short acquisition time. Accurate contrast bolus timing is mandatory. • Sparse CE-MRA enables fast vascular imaging with full brain coverage. • Volumes with sub-millimetre resolution can be acquired within 10 seconds. • Reader's ratings are good to intermediate and dependent on contrast bolus timing. • The method provides an excellent overview and allows screening for vascular pathologies.

  12. Sparse and stable Markowitz portfolios

    PubMed Central

    Brodie, Joshua; Daubechies, Ingrid; De Mol, Christine; Giannone, Domenico; Loris, Ignace

    2009-01-01

    We consider the problem of portfolio selection within the classical Markowitz mean-variance framework, reformulated as a constrained least-squares regression problem. We propose to add to the objective function a penalty proportional to the sum of the absolute values of the portfolio weights. This penalty regularizes (stabilizes) the optimization problem, encourages sparse portfolios (i.e., portfolios with only few active positions), and allows accounting for transaction costs. Our approach recovers as special cases the no-short-positions portfolios, but does allow for short positions in limited number. We implement this methodology on two benchmark data sets constructed by Fama and French. Using only a modest amount of training data, we construct portfolios whose out-of-sample performance, as measured by Sharpe ratio, is consistently and significantly better than that of the naïve evenly weighted portfolio. PMID:19617537

  13. Meteorological influence on predicting surface SO2 concentration from satellite remote sensing in Shanghai, China.

    PubMed

    Xue, Dan; Yin, Jingyuan

    2014-05-01

    In this study, we explored the potential applications of the Ozone Monitoring Instrument (OMI) satellite sensor in air pollution research. The OMI planetary boundary layer sulfur dioxide (SO2_PBL) column density and daily average surface SO2 concentration of Shanghai from 2004 to 2012 were analyzed. After several consecutive years of increase, the surface SO2 concentration finally declined in 2007. It was higher in winter than in other seasons. The coefficient between daily average surface SO2 concentration and SO2_PBL was only 0.316. But SO2_PBL was found to be a highly significant predictor of the surface SO2 concentration using the simple regression model. Five meteorological factors were considered in this study, among them, temperature, dew point, relative humidity, and wind speed were negatively correlated with surface SO2 concentration, while pressure was positively correlated. Furthermore, it was found that dew point was a more effective predictor than temperature. When these meteorological factors were used in multiple regression, the determination coefficient reached 0.379. The relationship of the surface SO2 concentration and meteorological factors was seasonally dependent. In summer and autumn, the regression model performed better than in spring and winter. The surface SO2 concentration predicting method proposed in this study can be easily adapted for other regions, especially most useful for those having no operational air pollution forecasting services or having sparse ground monitoring networks.

  14. Signal Sampling for Efficient Sparse Representation of Resting State FMRI Data

    PubMed Central

    Ge, Bao; Makkie, Milad; Wang, Jin; Zhao, Shijie; Jiang, Xi; Li, Xiang; Lv, Jinglei; Zhang, Shu; Zhang, Wei; Han, Junwei; Guo, Lei; Liu, Tianming

    2015-01-01

    As the size of brain imaging data such as fMRI grows explosively, it provides us with unprecedented and abundant information about the brain. How to reduce the size of fMRI data but not lose much information becomes a more and more pressing issue. Recent literature studies tried to deal with it by dictionary learning and sparse representation methods, however, their computation complexities are still high, which hampers the wider application of sparse representation method to large scale fMRI datasets. To effectively address this problem, this work proposes to represent resting state fMRI (rs-fMRI) signals of a whole brain via a statistical sampling based sparse representation. First we sampled the whole brain’s signals via different sampling methods, then the sampled signals were aggregate into an input data matrix to learn a dictionary, finally this dictionary was used to sparsely represent the whole brain’s signals and identify the resting state networks. Comparative experiments demonstrate that the proposed signal sampling framework can speed-up by ten times in reconstructing concurrent brain networks without losing much information. The experiments on the 1000 Functional Connectomes Project further demonstrate its effectiveness and superiority. PMID:26646924

  15. Target Transformation Constrained Sparse Unmixing (ttcsu) Algorithm for Retrieving Hydrous Minerals on Mars: Application to Southwest Melas Chasma

    NASA Astrophysics Data System (ADS)

    Lin, H.; Zhang, X.; Wu, X.; Tarnas, J. D.; Mustard, J. F.

    2018-04-01

    Quantitative analysis of hydrated minerals from hyperspectral remote sensing data is fundamental for understanding Martian geologic process. Because of the difficulties for selecting endmembers from hyperspectral images, a sparse unmixing algorithm has been proposed to be applied to CRISM data on Mars. However, it's challenge when the endmember library increases dramatically. Here, we proposed a new methodology termed Target Transformation Constrained Sparse Unmixing (TTCSU) to accurately detect hydrous minerals on Mars. A new version of target transformation technique proposed in our recent work was used to obtain the potential detections from CRISM data. Sparse unmixing constrained with these detections as prior information was applied to CRISM single-scattering albedo images, which were calculated using a Hapke radiative transfer model. This methodology increases success rate of the automatic endmember selection of sparse unmixing and could get more accurate abundances. CRISM images with well analyzed in Southwest Melas Chasma was used to validate our methodology in this study. The sulfates jarosite was detected from Southwest Melas Chasma, the distribution is consistent with previous work and the abundance is comparable. More validations will be done in our future work.

  16. One-shot 3D scanning by combining sparse landmarks with dense gradient information

    NASA Astrophysics Data System (ADS)

    Di Martino, Matías; Flores, Jorge; Ferrari, José A.

    2018-06-01

    Scene understanding is one of the most challenging and popular problems in the field of robotics and computer vision and the estimation of 3D information is at the core of most of these applications. In order to retrieve the 3D structure of a test surface we propose a single shot approach that combines dense gradient information with sparse absolute measurements. To that end, we designed a colored pattern that codes fine horizontal and vertical fringes, with sparse corners landmarks. By measuring the deformation (bending) of horizontal and vertical fringes, we are able to estimate surface local variations (i.e. its gradient field). Then corner sparse landmarks are detected and matched to infer spare absolute information about the test surface height. Local gradient information is combined with the sparse absolute values which work as anchors to guide the integration process. We show that this can be mathematically done in a very compact and intuitive way by properly defining a Poisson-like partial differential equation. Then we address in detail how the problem can be formulated in a discrete domain and how it can be practically solved by straight forward linear numerical solvers. Finally, validation experiment are presented.

  17. Microstructure Images Restoration of Metallic Materials Based upon KSVD and Smoothing Penalty Sparse Representation Approach.

    PubMed

    Li, Qing; Liang, Steven Y

    2018-04-20

    Microstructure images of metallic materials play a significant role in industrial applications. To address image degradation problem of metallic materials, a novel image restoration technique based on K-means singular value decomposition (KSVD) and smoothing penalty sparse representation (SPSR) algorithm is proposed in this work, the microstructure images of aluminum alloy 7075 (AA7075) material are used as examples. To begin with, to reflect the detail structure characteristics of the damaged image, the KSVD dictionary is introduced to substitute the traditional sparse transform basis (TSTB) for sparse representation. Then, due to the image restoration, modeling belongs to a highly underdetermined equation, and traditional sparse reconstruction methods may cause instability and obvious artifacts in the reconstructed images, especially reconstructed image with many smooth regions and the noise level is strong, thus the SPSR (here, q = 0.5) algorithm is designed to reconstruct the damaged image. The results of simulation and two practical cases demonstrate that the proposed method has superior performance compared with some state-of-the-art methods in terms of restoration performance factors and visual quality. Meanwhile, the grain size parameters and grain boundaries of microstructure image are discussed before and after they are restored by proposed method.

  18. RZA-NLMF algorithm-based adaptive sparse sensing for realizing compressive sensing

    NASA Astrophysics Data System (ADS)

    Gui, Guan; Xu, Li; Adachi, Fumiyuki

    2014-12-01

    Nonlinear sparse sensing (NSS) techniques have been adopted for realizing compressive sensing in many applications such as radar imaging. Unlike the NSS, in this paper, we propose an adaptive sparse sensing (ASS) approach using the reweighted zero-attracting normalized least mean fourth (RZA-NLMF) algorithm which depends on several given parameters, i.e., reweighted factor, regularization parameter, and initial step size. First, based on the independent assumption, Cramer-Rao lower bound (CRLB) is derived as for the performance comparisons. In addition, reweighted factor selection method is proposed for achieving robust estimation performance. Finally, to verify the algorithm, Monte Carlo-based computer simulations are given to show that the ASS achieves much better mean square error (MSE) performance than the NSS.

  19. Response of selected binomial coefficients to varying degrees of matrix sparseness and to matrices with known data interrelationships

    USGS Publications Warehouse

    Archer, A.W.; Maples, C.G.

    1989-01-01

    Numerous departures from ideal relationships are revealed by Monte Carlo simulations of widely accepted binomial coefficients. For example, simulations incorporating varying levels of matrix sparseness (presence of zeros indicating lack of data) and computation of expected values reveal that not only are all common coefficients influenced by zero data, but also that some coefficients do not discriminate between sparse or dense matrices (few zero data). Such coefficients computationally merge mutually shared and mutually absent information and do not exploit all the information incorporated within the standard 2 ?? 2 contingency table; therefore, the commonly used formulae for such coefficients are more complicated than the actual range of values produced. Other coefficients do differentiate between mutual presences and absences; however, a number of these coefficients do not demonstrate a linear relationship to matrix sparseness. Finally, simulations using nonrandom matrices with known degrees of row-by-row similarities signify that several coefficients either do not display a reasonable range of values or are nonlinear with respect to known relationships within the data. Analyses with nonrandom matrices yield clues as to the utility of certain coefficients for specific applications. For example, coefficients such as Jaccard, Dice, and Baroni-Urbani and Buser are useful if correction of sparseness is desired, whereas the Russell-Rao coefficient is useful when sparseness correction is not desired. ?? 1989 International Association for Mathematical Geology.

  20. Spectrum recovery method based on sparse representation for segmented multi-Gaussian model

    NASA Astrophysics Data System (ADS)

    Teng, Yidan; Zhang, Ye; Ti, Chunli; Su, Nan

    2016-09-01

    Hyperspectral images can realize crackajack features discriminability for supplying diagnostic characteristics with high spectral resolution. However, various degradations may generate negative influence on the spectral information, including water absorption, bands-continuous noise. On the other hand, the huge data volume and strong redundancy among spectrums produced intense demand on compressing HSIs in spectral dimension, which also leads to the loss of spectral information. The reconstruction of spectral diagnostic characteristics has irreplaceable significance for the subsequent application of HSIs. This paper introduces a spectrum restoration method for HSIs making use of segmented multi-Gaussian model (SMGM) and sparse representation. A SMGM is established to indicating the unsymmetrical spectral absorption and reflection characteristics, meanwhile, its rationality and sparse property are discussed. With the application of compressed sensing (CS) theory, we implement sparse representation to the SMGM. Then, the degraded and compressed HSIs can be reconstructed utilizing the uninjured or key bands. Finally, we take low rank matrix recovery (LRMR) algorithm for post processing to restore the spatial details. The proposed method was tested on the spectral data captured on the ground with artificial water absorption condition and an AVIRIS-HSI data set. The experimental results in terms of qualitative and quantitative assessments demonstrate that the effectiveness on recovering the spectral information from both degradations and loss compression. The spectral diagnostic characteristics and the spatial geometry feature are well preserved.

  1. A bandwidth-efficient service for local information dissemination in sparse to dense roadways.

    PubMed

    Garcia-Lozano, Estrella; Campo, Celeste; Garcia-Rubio, Carlos; Cortes-Martin, Alberto; Rodriguez-Carrion, Alicia; Noriega-Vivas, Patricia

    2013-07-05

    Thanks to the research on Vehicular Ad Hoc Networks (VANETs), we will be able to deploy applications on roadways that will contribute to energy efficiency through a better planning of long trips. With this goal in mind, we have designed a gas/charging station advertising system, which takes advantage of the broadcast nature of the network. We have found that reducing the number of total sent packets is important, as it allows for a better use of the available bandwidth. We have designed improvements for a distance-based flooding scheme, so that it can support the advertising application with good results in sparse to dense roadway scenarios.

  2. A Bandwidth-Efficient Service for Local Information Dissemination in Sparse to Dense Roadways

    PubMed Central

    Garcia-Lozano, Estrella; Campo, Celeste; Garcia-Rubio, Carlos; Cortes-Martin, Alberto; Rodriguez-Carrion, Alicia; Noriega-Vivas, Patricia

    2013-01-01

    Thanks to the research on Vehicular Ad Hoc Networks (VANETs), we will be able to deploy applications on roadways that will contribute to energy efficiency through a better planning of long trips. With this goal in mind, we have designed a gas/charging station advertising system, which takes advantage of the broadcast nature of the network. We have found that reducing the number of total sent packets is important, as it allows for a better use of the available bandwidth. We have designed improvements for a distance-based flooding scheme, so that it can support the advertising application with good results in sparse to dense roadway scenarios. PMID:23881130

  3. IPF-LASSO: Integrative L 1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data

    PubMed Central

    Jiang, Xiaoyu; Fuchs, Mathias

    2017-01-01

    As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. PMID:28546826

  4. Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT.

    PubMed

    Han, Yoseob; Ye, Jong Chul

    2018-06-01

    X-ray computed tomography (CT) using sparse projection views is a recent approach to reduce the radiation dose. However, due to the insufficient projection views, an analytic reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Recently, deep learning approaches using large receptive field neural networks such as U-Net have demonstrated impressive performance for sparse-view CT reconstruction. However, theoretical justification is still lacking. Inspired by the recent theory of deep convolutional framelets, the main goal of this paper is, therefore, to reveal the limitation of U-Net and propose new multi-resolution deep learning schemes. In particular, we show that the alternative U-Net variants such as dual frame and tight frame U-Nets satisfy the so-called frame condition which makes them better for effective recovery of high frequency edges in sparse-view CT. Using extensive experiments with real patient data set, we demonstrate that the new network architectures provide better reconstruction performance.

  5. Sparse dictionary learning for resting-state fMRI analysis

    NASA Astrophysics Data System (ADS)

    Lee, Kangjoo; Han, Paul Kyu; Ye, Jong Chul

    2011-09-01

    Recently, there has been increased interest in the usage of neuroimaging techniques to investigate what happens in the brain at rest. Functional imaging studies have revealed that the default-mode network activity is disrupted in Alzheimer's disease (AD). However, there is no consensus, as yet, on the choice of analysis method for the application of resting-state analysis for disease classification. This paper proposes a novel compressed sensing based resting-state fMRI analysis tool called Sparse-SPM. As the brain's functional systems has shown to have features of complex networks according to graph theoretical analysis, we apply a graph model to represent a sparse combination of information flows in complex network perspectives. In particular, a new concept of spatially adaptive design matrix has been proposed by implementing sparse dictionary learning based on sparsity. The proposed approach shows better performance compared to other conventional methods, such as independent component analysis (ICA) and seed-based approach, in classifying the AD patients from normal using resting-state analysis.

  6. Dimension-Factorized Range Migration Algorithm for Regularly Distributed Array Imaging

    PubMed Central

    Guo, Qijia; Wang, Jie; Chang, Tianying

    2017-01-01

    The two-dimensional planar MIMO array is a popular approach for millimeter wave imaging applications. As a promising practical alternative, sparse MIMO arrays have been devised to reduce the number of antenna elements and transmitting/receiving channels with predictable and acceptable loss in image quality. In this paper, a high precision three-dimensional imaging algorithm is proposed for MIMO arrays of the regularly distributed type, especially the sparse varieties. Termed the Dimension-Factorized Range Migration Algorithm, the new imaging approach factorizes the conventional MIMO Range Migration Algorithm into multiple operations across the sparse dimensions. The thinner the sparse dimensions of the array, the more efficient the new algorithm will be. Advantages of the proposed approach are demonstrated by comparison with the conventional MIMO Range Migration Algorithm and its non-uniform fast Fourier transform based variant in terms of all the important characteristics of the approaches, especially the anti-noise capability. The computation cost is analyzed as well to evaluate the efficiency quantitatively. PMID:29113083

  7. Labyrinth, An Abstract Model for Hypermedia Applications. Description of its Static Components.

    ERIC Educational Resources Information Center

    Diaz, Paloma; Aedo, Ignacio; Panetsos, Fivos

    1997-01-01

    The model for hypermedia applications called Labyrinth allows: (1) the design of platform-independent hypermedia applications; (2) the categorization, generalization and abstraction of sparse unstructured heterogeneous information in multiple and interconnected levels; (3) the creation of personal views in multiuser hyperdocuments for both groups…

  8. Subject-based discriminative sparse representation model for detection of concealed information.

    PubMed

    Akhavan, Amir; Moradi, Mohammad Hassan; Vand, Safa Rafiei

    2017-05-01

    The use of machine learning approaches in concealed information test (CIT) plays a key role in the progress of this neurophysiological field. In this paper, we presented a new machine learning method for CIT in which each subject is considered independent of the others. The main goal of this study is to adapt the discriminative sparse models to be applicable for subject-based concealed information test. In order to provide sufficient discriminability between guilty and innocent subjects, we introduced a novel discriminative sparse representation model and its appropriate learning methods. For evaluation of the method forty-four subjects participated in a mock crime scenario and their EEG data were recorded. As the model input, in this study the recurrence plot features were extracted from single trial data of different stimuli. Then the extracted feature vectors were reduced using statistical dependency method. The reduced feature vector went through the proposed subject-based sparse model in which the discrimination power of sparse code and reconstruction error were applied simultaneously. Experimental results showed that the proposed approach achieved better performance than other competing discriminative sparse models. The classification accuracy, sensitivity and specificity of the presented sparsity-based method were about 93%, 91% and 95% respectively. Using the EEG data of a single subject in response to different stimuli types and with the aid of the proposed discriminative sparse representation model, one can distinguish guilty subjects from innocent ones. Indeed, this property eliminates the necessity of several subject EEG data in model learning and decision making for a specific subject. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Application distribution model and related security attacks in VANET

    NASA Astrophysics Data System (ADS)

    Nikaein, Navid; Kanti Datta, Soumya; Marecar, Irshad; Bonnet, Christian

    2013-03-01

    In this paper, we present a model for application distribution and related security attacks in dense vehicular ad hoc networks (VANET) and sparse VANET which forms a delay tolerant network (DTN). We study the vulnerabilities of VANET to evaluate the attack scenarios and introduce a new attacker`s model as an extension to the work done in [6]. Then a VANET model has been proposed that supports the application distribution through proxy app stores on top of mobile platforms installed in vehicles. The steps of application distribution have been studied in detail. We have identified key attacks (e.g. malware, spamming and phishing, software attack and threat to location privacy) for dense VANET and two attack scenarios for sparse VANET. It has been shown that attacks can be launched by distributing malicious applications and injecting malicious codes to On Board Unit (OBU) by exploiting OBU software security holes. Consequences of such security attacks have been described. Finally, countermeasures including the concepts of sandbox have also been presented in depth.

  10. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de

    2015-07-21

    In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implementsmore » sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.« less

  11. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals.

    PubMed

    Pinski, Peter; Riplinger, Christoph; Valeev, Edward F; Neese, Frank

    2015-07-21

    In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.

  12. Online estimation of lithium-ion battery capacity using sparse Bayesian learning

    NASA Astrophysics Data System (ADS)

    Hu, Chao; Jain, Gaurav; Schmidt, Craig; Strief, Carrie; Sullivan, Melani

    2015-09-01

    Lithium-ion (Li-ion) rechargeable batteries are used as one of the major energy storage components for implantable medical devices. Reliability of Li-ion batteries used in these devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, patients and physicians. To ensure a Li-ion battery operates reliably, it is important to develop health monitoring techniques that accurately estimate the capacity of the battery throughout its life-time. This paper presents a sparse Bayesian learning method that utilizes the charge voltage and current measurements to estimate the capacity of a Li-ion battery used in an implantable medical device. Relevance Vector Machine (RVM) is employed as a probabilistic kernel regression method to learn the complex dependency of the battery capacity on the characteristic features that are extracted from the charge voltage and current measurements. Owing to the sparsity property of RVM, the proposed method generates a reduced-scale regression model that consumes only a small fraction of the CPU time required by a full-scale model, which makes online capacity estimation computationally efficient. 10 years' continuous cycling data and post-explant cycling data obtained from Li-ion prismatic cells are used to verify the performance of the proposed method.

  13. Resonance-Based Sparse Signal Decomposition and its Application in Mechanical Fault Diagnosis: A Review.

    PubMed

    Huang, Wentao; Sun, Hongjian; Wang, Weijie

    2017-06-03

    Mechanical equipment is the heart of industry. For this reason, mechanical fault diagnosis has drawn considerable attention. In terms of the rich information hidden in fault vibration signals, the processing and analysis techniques of vibration signals have become a crucial research issue in the field of mechanical fault diagnosis. Based on the theory of sparse decomposition, Selesnick proposed a novel nonlinear signal processing method: resonance-based sparse signal decomposition (RSSD). Since being put forward, RSSD has become widely recognized, and many RSSD-based methods have been developed to guide mechanical fault diagnosis. This paper attempts to summarize and review the theoretical developments and application advances of RSSD in mechanical fault diagnosis, and to provide a more comprehensive reference for those interested in RSSD and mechanical fault diagnosis. Followed by a brief introduction of RSSD's theoretical foundation, based on different optimization directions, applications of RSSD in mechanical fault diagnosis are categorized into five aspects: original RSSD, parameter optimized RSSD, subband optimized RSSD, integrated optimized RSSD, and RSSD combined with other methods. On this basis, outstanding issues in current RSSD study are also pointed out, as well as corresponding instructional solutions. We hope this review will provide an insightful reference for researchers and readers who are interested in RSSD and mechanical fault diagnosis.

  14. Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas

    In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less

  15. Resonance-Based Sparse Signal Decomposition and Its Application in Mechanical Fault Diagnosis: A Review

    PubMed Central

    Huang, Wentao; Sun, Hongjian; Wang, Weijie

    2017-01-01

    Mechanical equipment is the heart of industry. For this reason, mechanical fault diagnosis has drawn considerable attention. In terms of the rich information hidden in fault vibration signals, the processing and analysis techniques of vibration signals have become a crucial research issue in the field of mechanical fault diagnosis. Based on the theory of sparse decomposition, Selesnick proposed a novel nonlinear signal processing method: resonance-based sparse signal decomposition (RSSD). Since being put forward, RSSD has become widely recognized, and many RSSD-based methods have been developed to guide mechanical fault diagnosis. This paper attempts to summarize and review the theoretical developments and application advances of RSSD in mechanical fault diagnosis, and to provide a more comprehensive reference for those interested in RSSD and mechanical fault diagnosis. Followed by a brief introduction of RSSD’s theoretical foundation, based on different optimization directions, applications of RSSD in mechanical fault diagnosis are categorized into five aspects: original RSSD, parameter optimized RSSD, subband optimized RSSD, integrated optimized RSSD, and RSSD combined with other methods. On this basis, outstanding issues in current RSSD study are also pointed out, as well as corresponding instructional solutions. We hope this review will provide an insightful reference for researchers and readers who are interested in RSSD and mechanical fault diagnosis. PMID:28587198

  16. Efficient large-scale graph data optimization for intelligent video surveillance

    NASA Astrophysics Data System (ADS)

    Shang, Quanhong; Zhang, Shujun; Wang, Yanbo; Sun, Chen; Wang, Zepeng; Zhang, Luming

    2017-08-01

    Society is rapidly accepting the use of a wide variety of cameras Location and applications: site traffic monitoring, parking Lot surveillance, car and smart space. These ones here the camera provides data every day in an analysis Effective way. Recent advances in sensor technology Manufacturing, communications and computing are stimulating.The development of new applications that can change the traditional Vision system incorporating universal smart camera network. This Analysis of visual cues in multi camera networks makes wide Applications ranging from smart home and office automation to large area surveillance and traffic surveillance. In addition, dense Camera networks, most of which have large overlapping areas of cameras. In the view of good research, we focus on sparse camera networks. One Sparse camera network using large area surveillance. As few cameras as possible, most cameras do not overlap Each other’s field of vision. This task is challenging Lack of knowledge of topology Network, the specific changes in appearance and movement Track different opinions of the target, as well as difficulties Understanding complex events in a network. In this review in this paper, we present a comprehensive survey of recent studies Results to solve the problem of topology learning, Object appearance modeling and global activity understanding sparse camera network. In addition, some of the current open Research issues are discussed.

  17. Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

    DOE PAGES

    Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...

    2016-01-06

    In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less

  18. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging.

    PubMed

    Rosa, Maria J; Mehta, Mitul A; Pich, Emilio M; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A T S; Williams, Steve C R; Dazzan, Paola; Doyle, Orla M; Marquand, Andre F

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.

  19. Applications of compressed sensing image reconstruction to sparse view phase tomography

    NASA Astrophysics Data System (ADS)

    Ueda, Ryosuke; Kudo, Hiroyuki; Dong, Jian

    2017-10-01

    X-ray phase CT has a potential to give the higher contrast in soft tissue observations. To shorten the measure- ment time, sparse-view CT data acquisition has been attracting the attention. This paper applies two major compressed sensing (CS) approaches to image reconstruction in the x-ray sparse-view phase tomography. The first CS approach is the standard Total Variation (TV) regularization. The major drawbacks of TV regularization are a patchy artifact and loss in smooth intensity changes due to the piecewise constant nature of image model. The second CS method is a relatively new approach of CS which uses a nonlinear smoothing filter to design the regularization term. The nonlinear filter based CS is expected to reduce the major artifact in the TV regular- ization. The both cost functions can be minimized by the very fast iterative reconstruction method. However, in the past research activities, it is not clearly demonstrated how much image quality difference occurs between the TV regularization and the nonlinear filter based CS in x-ray phase CT applications. We clarify the issue by applying the two CS applications to the case of x-ray phase tomography. We provide results with numerically simulated data, which demonstrates that the nonlinear filter based CS outperforms the TV regularization in terms of textures and smooth intensity changes.

  20. Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Avron, Haim; Ng, Esmond G.; Toledo, Sivan

    2008-03-21

    We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we canmore » drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.« less

  1. Contemporary ultrasonic signal processing approaches for nondestructive evaluation of multilayered structures

    NASA Astrophysics Data System (ADS)

    Zhang, Guang-Ming; Harvey, David M.

    2012-03-01

    Various signal processing techniques have been used for the enhancement of defect detection and defect characterisation. Cross-correlation, filtering, autoregressive analysis, deconvolution, neural network, wavelet transform and sparse signal representations have all been applied in attempts to analyse ultrasonic signals. In ultrasonic nondestructive evaluation (NDE) applications, a large number of materials have multilayered structures. NDE of multilayered structures leads to some specific problems, such as penetration, echo overlap, high attenuation and low signal-to-noise ratio. The signals recorded from a multilayered structure are a class of very special signals comprised of limited echoes. Such signals can be assumed to have a sparse representation in a proper signal dictionary. Recently, a number of digital signal processing techniques have been developed by exploiting the sparse constraint. This paper presents a review of research to date, showing the up-to-date developments of signal processing techniques made in ultrasonic NDE. A few typical ultrasonic signal processing techniques used for NDE of multilayered structures are elaborated. The practical applications and limitations of different signal processing methods in ultrasonic NDE of multilayered structures are analysed.

  2. Curvelet-based compressive sensing for InSAR raw data

    NASA Astrophysics Data System (ADS)

    Costa, Marcello G.; da Silva Pinho, Marcelo; Fernandes, David

    2015-10-01

    The aim of this work is to evaluate the compression performance of SAR raw data for interferometry applications collected by airborne from BRADAR (Brazilian SAR System operating in X and P bands) using the new approach based on compressive sensing (CS) to achieve an effective recovery with a good phase preserving. For this framework is desirable a real-time capability, where the collected data can be compressed to reduce onboard storage and bandwidth required for transmission. In the CS theory, a sparse unknown signals can be recovered from a small number of random or pseudo-random measurements by sparsity-promoting nonlinear recovery algorithms. Therefore, the original signal can be significantly reduced. To achieve the sparse representation of SAR signal, was done a curvelet transform. The curvelets constitute a directional frame, which allows an optimal sparse representation of objects with discontinuities along smooth curves as observed in raw data and provides an advanced denoising optimization. For the tests were made available a scene of 8192 x 2048 samples in range and azimuth in X-band with 2 m of resolution. The sparse representation was compressed using low dimension measurements matrices in each curvelet subband. Thus, an iterative CS reconstruction method based on IST (iterative soft/shrinkage threshold) was adjusted to recover the curvelets coefficients and then the original signal. To evaluate the compression performance were computed the compression ratio (CR), signal to noise ratio (SNR), and because the interferometry applications require more reconstruction accuracy the phase parameters like the standard deviation of the phase (PSD) and the mean phase error (MPE) were also computed. Moreover, in the image domain, a single-look complex image was generated to evaluate the compression effects. All results were computed in terms of sparsity analysis to provides an efficient compression and quality recovering appropriated for inSAR applications, therefore, providing a feasibility for compressive sensing application.

  3. Real-Time Characterization of Aerospace Structures Using Onboard Strain Measurement Technologies and Inverse Finite Element Method

    DTIC Science & Technology

    2011-09-01

    strain data provided by in-situ strain sensors. The application focus is on the stain data obtained from FBG (Fiber Bragg Grating) sensor arrays...sparsely distributed lines to simulate strain data from FBG (Fiber Bragg Grating) arrays that provide either single-core (axial) or rosette (tri...when the measured strain data are sparse, as it is often the case when FBG sensors are used. For an inverse element without strain-sensor data, the

  4. Prediction of air pollutant concentration based on sparse response back-propagation training feedforward neural networks.

    PubMed

    Ding, Weifu; Zhang, Jiangshe; Leung, Yee

    2016-10-01

    In this paper, we predict air pollutant concentration using a feedforward artificial neural network inspired by the mechanism of the human brain as a useful alternative to traditional statistical modeling techniques. The neural network is trained based on sparse response back-propagation in which only a small number of neurons respond to the specified stimulus simultaneously and provide a high convergence rate for the trained network, in addition to low energy consumption and greater generalization. Our method is evaluated on Hong Kong air monitoring station data and corresponding meteorological variables for which five air quality parameters were gathered at four monitoring stations in Hong Kong over 4 years (2012-2015). Our results show that our training method has more advantages in terms of the precision of the prediction, effectiveness, and generalization of traditional linear regression algorithms when compared with a feedforward artificial neural network trained using traditional back-propagation.

  5. Decoding memory features from hippocampal spiking activities using sparse classification models.

    PubMed

    Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W

    2016-08-01

    To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.

  6. Finite difference method accelerated with sparse solvers for structural analysis of the metal-organic complexes

    NASA Astrophysics Data System (ADS)

    Guda, A. A.; Guda, S. A.; Soldatov, M. A.; Lomachenko, K. A.; Bugaev, A. L.; Lamberti, C.; Gawelda, W.; Bressler, C.; Smolentsev, G.; Soldatov, A. V.; Joly, Y.

    2016-05-01

    Finite difference method (FDM) implemented in the FDMNES software [Phys. Rev. B, 2001, 63, 125120] was revised. Thorough analysis shows, that the calculated diagonal in the FDM matrix consists of about 96% zero elements. Thus a sparse solver would be more suitable for the problem instead of traditional Gaussian elimination for the diagonal neighbourhood. We have tried several iterative sparse solvers and the direct one MUMPS solver with METIS ordering turned out to be the best. Compared to the Gaussian solver present method is up to 40 times faster and allows XANES simulations for complex systems already on personal computers. We show applicability of the software for metal-organic [Fe(bpy)3]2+ complex both for low spin and high spin states populated after laser excitation.

  7. Semi-blind sparse image reconstruction with application to MRFM.

    PubMed

    Park, Se Un; Dobigeon, Nicolas; Hero, Alfred O

    2012-09-01

    We propose a solution to the image deconvolution problem where the convolution kernel or point spread function (PSF) is assumed to be only partially known. Small perturbations generated from the model are exploited to produce a few principal components explaining the PSF uncertainty in a high-dimensional space. Unlike recent developments on blind deconvolution of natural images, we assume the image is sparse in the pixel basis, a natural sparsity arising in magnetic resonance force microscopy (MRFM). Our approach adopts a Bayesian Metropolis-within-Gibbs sampling framework. The performance of our Bayesian semi-blind algorithm for sparse images is superior to previously proposed semi-blind algorithms such as the alternating minimization algorithm and blind algorithms developed for natural images. We illustrate our myopic algorithm on real MRFM tobacco virus data.

  8. Sparsity-Cognizant Algorithms with Applications to Communications, Signal Processing, and the Smart Grid

    NASA Astrophysics Data System (ADS)

    Zhu, Hao

    Sparsity plays an instrumental role in a plethora of scientific fields, including statistical inference for variable selection, parsimonious signal representations, and solving under-determined systems of linear equations - what has led to the ground-breaking result of compressive sampling (CS). This Thesis leverages exciting ideas of sparse signal reconstruction to develop sparsity-cognizant algorithms, and analyze their performance. The vision is to devise tools exploiting the 'right' form of sparsity for the 'right' application domain of multiuser communication systems, array signal processing systems, and the emerging challenges in the smart power grid. Two important power system monitoring tasks are addressed first by capitalizing on the hidden sparsity. To robustify power system state estimation, a sparse outlier model is leveraged to capture the possible corruption in every datum, while the problem nonconvexity due to nonlinear measurements is handled using the semidefinite relaxation technique. Different from existing iterative methods, the proposed algorithm approximates well the global optimum regardless of the initialization. In addition, for enhanced situational awareness, a novel sparse overcomplete representation is introduced to capture (possibly multiple) line outages, and develop real-time algorithms for solving the combinatorially complex identification problem. The proposed algorithms exhibit near-optimal performance while incurring only linear complexity in the number of lines, which makes it possible to quickly bring contingencies to attention. This Thesis also accounts for two basic issues in CS, namely fully-perturbed models and the finite alphabet property. The sparse total least-squares (S-TLS) approach is proposed to furnish CS algorithms for fully-perturbed linear models, leading to statistically optimal and computationally efficient solvers. The S-TLS framework is well motivated for grid-based sensing applications and exhibits higher accuracy than existing sparse algorithms. On the other hand, exploiting the finite alphabet of unknown signals emerges naturally in communication systems, along with sparsity coming from the low activity of each user. Compared to approaches only accounting for either one of the two, joint exploitation of both leads to statistically optimal detectors with improved error performance.

  9. Functional Additive Mixed Models

    PubMed Central

    Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja

    2014-01-01

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. PMID:26347592

  10. Functional Additive Mixed Models.

    PubMed

    Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja

    2015-04-01

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

  11. Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits.

    PubMed

    Lu, Ake Tzu-Hui; Austin, Erin; Bonner, Ashley; Huang, Hsin-Hsiung; Cantor, Rita M

    2014-09-01

    Machine learning methods (MLMs), designed to develop models using high-dimensional predictors, have been used to analyze genome-wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single-nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy-based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high-dimensional data. © 2014 WILEY PERIODICALS, INC.

  12. Multiple Sparse Representations Classification

    PubMed Central

    Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and sparsity level. PMID:26177106

  13. Detection of dual-band infrared small target based on joint dynamic sparse representation

    NASA Astrophysics Data System (ADS)

    Zhou, Jinwei; Li, Jicheng; Shi, Zhiguang; Lu, Xiaowei; Ren, Dongwei

    2015-10-01

    Infrared small target detection is a crucial and yet still is a difficult issue in aeronautic and astronautic applications. Sparse representation is an important mathematic tool and has been used extensively in image processing in recent years. Joint sparse representation is applied in dual-band infrared dim target detection in this paper. Firstly, according to the characters of dim targets in dual-band infrared images, 2-dimension Gaussian intensity model was used to construct target dictionary, then the dictionary was classified into different sub-classes according to different positions of Gaussian function's center point in image block; The fact that dual-band small targets detection can use the same dictionary and the sparsity doesn't lie in atom-level but in sub-class level was utilized, hence the detection of targets in dual-band infrared images was converted to be a joint dynamic sparse representation problem. And the dynamic active sets were used to describe the sparse constraint of coefficients. Two modified sparsity concentration index (SCI) criteria was proposed to evaluate whether targets exist in the images. In experiments, it shows that the proposed algorithm can achieve better detecting performance and dual-band detection is much more robust to noise compared with single-band detection. Moreover, the proposed method can be expanded to multi-spectrum small target detection.

  14. Microstructure Images Restoration of Metallic Materials Based upon KSVD and Smoothing Penalty Sparse Representation Approach

    PubMed Central

    Liang, Steven Y.

    2018-01-01

    Microstructure images of metallic materials play a significant role in industrial applications. To address image degradation problem of metallic materials, a novel image restoration technique based on K-means singular value decomposition (KSVD) and smoothing penalty sparse representation (SPSR) algorithm is proposed in this work, the microstructure images of aluminum alloy 7075 (AA7075) material are used as examples. To begin with, to reflect the detail structure characteristics of the damaged image, the KSVD dictionary is introduced to substitute the traditional sparse transform basis (TSTB) for sparse representation. Then, due to the image restoration, modeling belongs to a highly underdetermined equation, and traditional sparse reconstruction methods may cause instability and obvious artifacts in the reconstructed images, especially reconstructed image with many smooth regions and the noise level is strong, thus the SPSR (here, q = 0.5) algorithm is designed to reconstruct the damaged image. The results of simulation and two practical cases demonstrate that the proposed method has superior performance compared with some state-of-the-art methods in terms of restoration performance factors and visual quality. Meanwhile, the grain size parameters and grain boundaries of microstructure image are discussed before and after they are restored by proposed method. PMID:29677163

  15. Locality constrained joint dynamic sparse representation for local matching based face recognition.

    PubMed

    Wang, Jianzhong; Yi, Yugen; Zhou, Wei; Shi, Yanjiao; Qi, Miao; Zhang, Ming; Zhang, Baoxue; Kong, Jun

    2014-01-01

    Recently, Sparse Representation-based Classification (SRC) has attracted a lot of attention for its applications to various tasks, especially in biometric techniques such as face recognition. However, factors such as lighting, expression, pose and disguise variations in face images will decrease the performances of SRC and most other face recognition techniques. In order to overcome these limitations, we propose a robust face recognition method named Locality Constrained Joint Dynamic Sparse Representation-based Classification (LCJDSRC) in this paper. In our method, a face image is first partitioned into several smaller sub-images. Then, these sub-images are sparsely represented using the proposed locality constrained joint dynamic sparse representation algorithm. Finally, the representation results for all sub-images are aggregated to obtain the final recognition result. Compared with other algorithms which process each sub-image of a face image independently, the proposed algorithm regards the local matching-based face recognition as a multi-task learning problem. Thus, the latent relationships among the sub-images from the same face image are taken into account. Meanwhile, the locality information of the data is also considered in our algorithm. We evaluate our algorithm by comparing it with other state-of-the-art approaches. Extensive experiments on four benchmark face databases (ORL, Extended YaleB, AR and LFW) demonstrate the effectiveness of LCJDSRC.

  16. Compressed learning and its applications to subcellular localization.

    PubMed

    Zheng, Zhong-Long; Guo, Li; Jia, Jiong; Xie, Chen-Mao; Zeng, Wen-Cai; Yang, Jie

    2011-09-01

    One of the main challenges faced by biological applications is to predict protein subcellular localization in automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological systems into account. Traditionally, such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressed Sensing) theory, we propose the methodology which performs compressed learning with a sparseness criterion such that feature selection and dimension reduction are merged into one analysis. The proposed methodology decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse methods are quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.

  17. A fast time-difference inverse solver for 3D EIT with application to lung imaging.

    PubMed

    Javaherian, Ashkan; Soleimani, Manuchehr; Moeller, Knut

    2016-08-01

    A class of sparse optimization techniques that require solely matrix-vector products, rather than an explicit access to the forward matrix and its transpose, has been paid much attention in the recent decade for dealing with large-scale inverse problems. This study tailors application of the so-called Gradient Projection for Sparse Reconstruction (GPSR) to large-scale time-difference three-dimensional electrical impedance tomography (3D EIT). 3D EIT typically suffers from the need for a large number of voxels to cover the whole domain, so its application to real-time imaging, for example monitoring of lung function, remains scarce since the large number of degrees of freedom of the problem extremely increases storage space and reconstruction time. This study shows the great potential of the GPSR for large-size time-difference 3D EIT. Further studies are needed to improve its accuracy for imaging small-size anomalies.

  18. Directivity of a Sparse Array in the Presence of Atmospheric-Induced Phase Fluctuations for Deep Space Communications

    NASA Technical Reports Server (NTRS)

    Nessel, James A.; Acosta, Robert J.

    2010-01-01

    Widely distributed (sparse) ground-based arrays have been utilized for decades in the radio science community for imaging celestial objects, but have only recently become an option for deep space communications applications with the advent of the proposed Next Generation Deep Space Network (DSN) array. But whereas in astronomical imaging, observations (receive-mode only) are made on the order of minutes to hours and atmospheric-induced aberrations can be mostly corrected for in post-processing, communications applications require transmit capabilities and real-time corrections over time scales as short as fractions of a second. This presents an unavoidable problem with the use of sparse arrays for deep space communications at Ka-band which has yet to be successfully resolved, particularly for uplink arraying. In this paper, an analysis of the performance of a sparse antenna array, in terms of its directivity, is performed to derive a closed form solution to the expected array loss in the presence of atmospheric-induced phase fluctuations. The theoretical derivation for array directivity degradation is validated with interferometric measurements for a two-element array taken at Goldstone, California. With the validity of the model established, an arbitrary 27-element array geometry is defined at Goldstone, California, to ascertain its performance in the presence of phase fluctuations. It is concluded that a combination of compact array geometry and atmospheric compensation is necessary to ensure high levels of availability.

  19. A Distributed Learning Method for ℓ1-Regularized Kernel Machine over Wireless Sensor Networks

    PubMed Central

    Ji, Xinrong; Hou, Cuiqin; Hou, Yibin; Gao, Fang; Wang, Shulong

    2016-01-01

    In wireless sensor networks, centralized learning methods have very high communication costs and energy consumption. These are caused by the need to transmit scattered training examples from various sensor nodes to the central fusion center where a classifier or a regression machine is trained. To reduce the communication cost, a distributed learning method for a kernel machine that incorporates ℓ1 norm regularization (ℓ1-regularized) is investigated, and a novel distributed learning algorithm for the ℓ1-regularized kernel minimum mean squared error (KMSE) machine is proposed. The proposed algorithm relies on in-network processing and a collaboration that transmits the sparse model only between single-hop neighboring nodes. This paper evaluates the proposed algorithm with respect to the prediction accuracy, the sparse rate of model, the communication cost and the number of iterations on synthetic and real datasets. The simulation results show that the proposed algorithm can obtain approximately the same prediction accuracy as that obtained by the batch learning method. Moreover, it is significantly superior in terms of the sparse rate of model and communication cost, and it can converge with fewer iterations. Finally, an experiment conducted on a wireless sensor network (WSN) test platform further shows the advantages of the proposed algorithm with respect to communication cost. PMID:27376298

  20. EIT Imaging Regularization Based on Spectral Graph Wavelets.

    PubMed

    Gong, Bo; Schullcke, Benjamin; Krueger-Ziolek, Sabine; Vauhkonen, Marko; Wolf, Gerhard; Mueller-Lisse, Ullrich; Moeller, Knut

    2017-09-01

    The objective of electrical impedance tomographic reconstruction is to identify the distribution of tissue conductivity from electrical boundary conditions. This is an ill-posed inverse problem usually solved under the finite-element method framework. In previous studies, standard sparse regularization was used for difference electrical impedance tomography to achieve a sparse solution. However, regarding elementwise sparsity, standard sparse regularization interferes with the smoothness of conductivity distribution between neighboring elements and is sensitive to noise. As an effect, the reconstructed images are spiky and depict a lack of smoothness. Such unexpected artifacts are not realistic and may lead to misinterpretation in clinical applications. To eliminate such artifacts, we present a novel sparse regularization method that uses spectral graph wavelet transforms. Single-scale or multiscale graph wavelet transforms are employed to introduce local smoothness on different scales into the reconstructed images. The proposed approach relies on viewing finite-element meshes as undirected graphs and applying wavelet transforms derived from spectral graph theory. Reconstruction results from simulations, a phantom experiment, and patient data suggest that our algorithm is more robust to noise and produces more reliable images.

  1. Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carleton, James Brian; Parks, Michael L.

    Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less

  2. Optical fringe-reflection deflectometry with sparse representation

    NASA Astrophysics Data System (ADS)

    Xiao, Yong-Liang; Li, Sikun; Zhang, Qican; Zhong, Jianxin; Su, Xianyu; You, Zhisheng

    2018-05-01

    Optical fringe-reflection deflectometry is a surprisingly attractive scratch detection technique for specular surfaces owing to its unparalleled local sensibility. Full-field surface topography is obtained from a measured normal field using gradient integration. However, there may not be an ideal measured gradient field for deflectometry reconstruction in practice. Both the non-integrability condition and various kinds of image noise distributions, which are present in the indirect measured gradient field, may lead to ambiguity about the scratches on specular surfaces. In order to reduce misjudgment of scratches, sparse representation is introduced into the Southwell curl equation for deflectometry. The curl can be represented as a linear combination of the given redundant dictionary for curl and the sparsest solution for gradient refinement. The non-integrability condition and noise permutation can be overcome with sparse representation for gradient refinement. Numerical simulations demonstrate that the accuracy rate of judgment of scratches can be enhanced with sparse representation compared to the standard least-squares integration. Preliminary experiments are performed with the application of practical measured deflectometric data to verify the validity of the algorithm.

  3. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruan, D; Yang, Y; Cao, M

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improvedmore » robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme generalizes to multiple tissue classes.« less

  4. Using a multifrontal sparse solver in a high performance, finite element code

    NASA Technical Reports Server (NTRS)

    King, Scott D.; Lucas, Robert; Raefsky, Arthur

    1990-01-01

    We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.

  5. The application of a sparse, distributed memory to the detection, identification and manipulation of physical objects

    NASA Technical Reports Server (NTRS)

    Kanerva, P.

    1986-01-01

    To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.

  6. A new scheduling algorithm for parallel sparse LU factorization with static pivoting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigori, Laura; Li, Xiaoye S.

    2002-08-20

    In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.

  7. Subject-Specific Sparse Dictionary Learning for Atlas-Based Brain MRI Segmentation.

    PubMed

    Roy, Snehashis; He, Qing; Sweeney, Elizabeth; Carass, Aaron; Reich, Daniel S; Prince, Jerry L; Pham, Dzung L

    2015-09-01

    Quantitative measurements from segmentations of human brain magnetic resonance (MR) images provide important biomarkers for normal aging and disease progression. In this paper, we propose a patch-based tissue classification method from MR images that uses a sparse dictionary learning approach and atlas priors. Training data for the method consists of an atlas MR image, prior information maps depicting where different tissues are expected to be located, and a hard segmentation. Unlike most atlas-based classification methods that require deformable registration of the atlas priors to the subject, only affine registration is required between the subject and training atlas. A subject-specific patch dictionary is created by learning relevant patches from the atlas. Then the subject patches are modeled as sparse combinations of learned atlas patches leading to tissue memberships at each voxel. The combination of prior information in an example-based framework enables us to distinguish tissues having similar intensities but different spatial locations. We demonstrate the efficacy of the approach on the application of whole-brain tissue segmentation in subjects with healthy anatomy and normal pressure hydrocephalus, as well as lesion segmentation in multiple sclerosis patients. For each application, quantitative comparisons are made against publicly available state-of-the art approaches.

  8. Direct and simultaneous estimation of cardiac four chamber volumes by multioutput sparse regression.

    PubMed

    Zhen, Xiantong; Zhang, Heye; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-02-01

    Cardiac four-chamber volume estimation serves as a fundamental and crucial role in clinical quantitative analysis of whole heart functions. It is a challenging task due to the huge complexity of the four chambers including great appearance variations, huge shape deformation and interference between chambers. Direct estimation has recently emerged as an effective and convenient tool for cardiac ventricular volume estimation. However, existing direct estimation methods were specifically developed for one single ventricle, i.e., left ventricle (LV), or bi-ventricles; they can not be directly used for four chamber volume estimation due to the great combinatorial variability and highly complex anatomical interdependency of the four chambers. In this paper, we propose a new, general framework for direct and simultaneous four chamber volume estimation. We have addressed two key issues, i.e., cardiac image representation and simultaneous four chamber volume estimation, which enables accurate and efficient four-chamber volume estimation. We generate compact and discriminative image representations by supervised descriptor learning (SDL) which can remove irrelevant information and extract discriminative features. We propose direct and simultaneous four-chamber volume estimation by the multioutput sparse latent regression (MSLR), which enables jointly modeling nonlinear input-output relationships and capturing four-chamber interdependence. The proposed method is highly generalized, independent of imaging modalities, which provides a general regression framework that can be extensively used for clinical data prediction to achieve automated diagnosis. Experiments on both MR and CT images show that our method achieves high performance with a correlation coefficient of up to 0.921 with ground truth obtained manually by human experts, which is clinically significant and enables more accurate, convenient and comprehensive assessment of cardiac functions. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Hierarchical Bayesian modelling of mobility metrics for hazard model input calibration

    NASA Astrophysics Data System (ADS)

    Calder, Eliza; Ogburn, Sarah; Spiller, Elaine; Rutarindwa, Regis; Berger, Jim

    2015-04-01

    In this work we present a method to constrain flow mobility input parameters for pyroclastic flow models using hierarchical Bayes modeling of standard mobility metrics such as H/L and flow volume etc. The advantage of hierarchical modeling is that it can leverage the information in global dataset for a particular mobility metric in order to reduce the uncertainty in modeling of an individual volcano, especially important where individual volcanoes have only sparse datasets. We use compiled pyroclastic flow runout data from Colima, Merapi, Soufriere Hills, Unzen and Semeru volcanoes, presented in an open-source database FlowDat (https://vhub.org/groups/massflowdatabase). While the exact relationship between flow volume and friction varies somewhat between volcanoes, dome collapse flows originating from the same volcano exhibit similar mobility relationships. Instead of fitting separate regression models for each volcano dataset, we use a variation of the hierarchical linear model (Kass and Steffey, 1989). The model presents a hierarchical structure with two levels; all dome collapse flows and dome collapse flows at specific volcanoes. The hierarchical model allows us to assume that the flows at specific volcanoes share a common distribution of regression slopes, then solves for that distribution. We present comparisons of the 95% confidence intervals on the individual regression lines for the data set from each volcano as well as those obtained from the hierarchical model. The results clearly demonstrate the advantage of considering global datasets using this technique. The technique developed is demonstrated here for mobility metrics, but can be applied to many other global datasets of volcanic parameters. In particular, such methods can provide a means to better contain parameters for volcanoes for which we only have sparse data, a ubiquitous problem in volcanology.

  10. Catchments as non-linear filters: evaluating data-driven approaches for spatio-temporal predictions in ungauged basins

    NASA Astrophysics Data System (ADS)

    Bellugi, D. G.; Tennant, C.; Larsen, L.

    2016-12-01

    Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.

  11. Interhemispheric comparison of atmospheric circulation features as evaluated from NIMBUS satellite data

    NASA Technical Reports Server (NTRS)

    Reiter, E. R.; Vonderhaar, T. H.; Lovill, J. E.; Adler, R.; Srivatsangam, S.; Abbey, R.

    1971-01-01

    Findings are presented for IRIS data from NIMBUS 3 in mapping the global ozone distribution. The seasonal and regional variations of ozone, especially in the Southern Hemisphere, reveal features that were not evident from the sparse ground-based ozone observation network in this hemisphere. A regression analysis was undertaken for temperature and height fields on radiance data. Spectrum analyses of upper wind data from the North American section and Australia were completed.

  12. Semi-Supervised Clustering for High-Dimensional and Sparse Features

    ERIC Educational Resources Information Center

    Yan, Su

    2010-01-01

    Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…

  13. Feature selection using probabilistic prediction of support vector regression.

    PubMed

    Yang, Jian-Bo; Ong, Chong-Jin

    2011-06-01

    This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.

  14. Two-level structural sparsity regularization for identifying lattices and defects in noisy images

    DOE PAGES

    Li, Xin; Belianinov, Alex; Dyck, Ondrej E.; ...

    2018-03-09

    Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less

  15. Two-level structural sparsity regularization for identifying lattices and defects in noisy images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Xin; Belianinov, Alex; Dyck, Ondrej E.

    Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less

  16. Mixing in 3D Sparse Multi-Scale Grid Generated Turbulence

    NASA Astrophysics Data System (ADS)

    Usama, Syed; Kopec, Jacek; Tellez, Jackson; Kwiatkowski, Kamil; Redondo, Jose; Malik, Nadeem

    2017-04-01

    Flat 2D fractal grids are known to alter turbulence characteristics downstream of the grid as compared to the regular grids with the same blockage ratio and the same mass inflow rates [1]. This has excited interest in the turbulence community for possible exploitation for enhanced mixing and related applications. Recently, a new 3D multi-scale grid design has been proposed [2] such that each generation of length scale of turbulence grid elements is held in its own frame, the overall effect is a 3D co-planar arrangement of grid elements. This produces a 'sparse' grid system whereby each generation of grid elements produces a turbulent wake pattern that interacts with the other wake patterns downstream. A critical motivation here is that the effective blockage ratio in the 3D Sparse Grid Turbulence (3DSGT) design is significantly lower than in the flat 2D counterpart - typically the blockage ratio could be reduced from say 20% in 2D down to 4% in the 3DSGT. If this idea can be realized in practice, it could potentially greatly enhance the efficiency of turbulent mixing and transfer processes clearly having many possible applications. Work has begun on the 3DSGT experimentally using Surface Flow Image Velocimetry (SFIV) [3] at the European facility in the Max Planck Institute for Dynamics and Self-Organization located in Gottingen, Germany and also at the Technical University of Catalonia (UPC) in Spain, and numerically using Direct Numerical Simulation (DNS) at King Fahd University of Petroleum & Minerals (KFUPM) in Saudi Arabia and in University of Warsaw in Poland. DNS is the most useful method to compare the experimental results with, and we are studying different types of codes such as Imcompact3d, and OpenFoam. Many variables will eventually be investigated for optimal mixing conditions. For example, the number of scale generations, the spacing between frames, the size ratio of grid elements, inflow conditions, etc. We will report upon the first set of findings from the 3DSGT by the time of the conference. {Acknowledgements}: This work has been supported partly by the EuHIT grant, 'Turbulence Generated by Sparse 3D Multi-Scale Grid (M3SG)', 2017. {References} [1] S. Laizet, J. C. Vassilicos. DNS of Fractal-Generated Turbulence. Flow Turbulence Combust 87:673705, (2011). [2] N. A. Malik. Sparse 3D Multi-Scale Grid Turbulence Generator. USPTO Application no. 14/710,531, Patent Pending, (2015). [3] J. Tellez, M. Gomez, B. Russo, J.M. Redondo. Surface Flow Image Velocimetry (SFIV) for hydraulics applications. 18th Int. Symposium on the Application of Laser Imaging Techniques in Fluid Mechanics, Lisbon, Portugal (2016).

  17. The application of compressed sensing to long-term acoustic emission-based structural health monitoring

    NASA Astrophysics Data System (ADS)

    Cattaneo, Alessandro; Park, Gyuhae; Farrar, Charles; Mascareñas, David

    2012-04-01

    The acoustic emission (AE) phenomena generated by a rapid release in the internal stress of a material represent a promising technique for structural health monitoring (SHM) applications. AE events typically result in a discrete number of short-time, transient signals. The challenge associated with capturing these events using classical techniques is that very high sampling rates must be used over extended periods of time. The result is that a very large amount of data is collected to capture a phenomenon that rarely occurs. Furthermore, the high energy consumption associated with the required high sampling rates makes the implementation of high-endurance, low-power, embedded AE sensor nodes difficult to achieve. The relatively rare occurrence of AE events over long time scales implies that these measurements are inherently sparse in the spike domain. The sparse nature of AE measurements makes them an attractive candidate for the application of compressed sampling techniques. Collecting compressed measurements of sparse AE signals will relax the requirements on the sampling rate and memory demands. The focus of this work is to investigate the suitability of compressed sensing techniques for AE-based SHM. The work explores estimating AE signal statistics in the compressed domain for low-power classification applications. In the event compressed classification finds an event of interest, ι1 norm minimization will be used to reconstruct the measurement for further analysis. The impact of structured noise on compressive measurements is specifically addressed. The suitability of a particular algorithm, called Justice Pursuit, to increase robustness to a small amount of arbitrary measurement corruption is investigated.

  18. A prospective study of mandibular trabecular bone to predict fracture incidence in women: a low-cost screening tool in the dental clinic.

    PubMed

    Jonasson, Grethe; Sundh, Valter; Ahlqwist, Margareta; Hakeberg, Magnus; Björkelund, Cecilia; Lissner, Lauren

    2011-10-01

    Bone structure is the key to the understanding of fracture risk. The hypothesis tested in this prospective study is that dense mandibular trabeculation predicts low fracture risk, whereas sparse trabeculation is predictive of high fracture risk. Out of 731 women from the Prospective Population Study of Women in Gothenburg with dental examinations at baseline 1968, 222 had their first fracture in the follow-up period until 2006. Mandibular trabeculation was defined as dense, mixed dense plus sparse, and sparse based on panoramic radiographs from 1968 and/or 1980. Time to fracture was ascertained and used as the dependent variable in three Cox proportional hazards regression analyses. The first analysis covered 12 years of follow-up with self-reported endpoints; the second covered 26 years of follow-up with hospital verified endpoints; and the third combined the two follow-up periods, totaling 38 years. Mandibular trabeculation was the main independent variable predicting incident fractures, with age, physical activity, alcohol consumption and body mass index as covariates. The Kaplan-Meier curve indicated a graded association between trabecular density and fracture risk. During the whole period covered, the hazard ratio of future fracture for sparse trabeculation compared to mixed trabeculation was 2.9 (95% CI: 2.2-3.8, p<0.0001), and for dense versus mixed trabeculation was 0.21 (95% CI: 0.1-0.4, p<0.0001). The trabecular pattern was a highly significant predictor of future fracture risk. Our findings imply that dentists, using ordinary dental radiographs, can identify women at high risk for future fractures at 38-54 years of age, often long before the first fracture occurs. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Ultra-low dose quantitative CT myocardial perfusion imaging with sparse-view dynamic acquisition and image reconstruction: A feasibility study.

    PubMed

    Enjilela, Esmaeil; Lee, Ting-Yim; Hsieh, Jiang; Wisenberg, Gerald; Teefy, Patrick; Yadegari, Andrew; Bagur, Rodrigo; Islam, Ali; Branch, Kelley; So, Aaron

    2018-03-01

    We implemented and validated a compressed sensing (CS) based algorithm for reconstructing dynamic contrast-enhanced (DCE) CT images of the heart from sparsely sampled X-ray projections. DCE CT imaging of the heart was performed on five normal and ischemic pigs after contrast injection. DCE images were reconstructed with filtered backprojection (FBP) and CS from all projections (984-view) and 1/3 of all projections (328-view), and with CS from 1/4 of all projections (246-view). Myocardial perfusion (MP) measurements with each protocol were compared to those with the reference 984-view FBP protocol. Both the 984-view CS and 328-view CS protocols were in good agreements with the reference protocol. The Pearson correlation coefficients of 984-view CS and 328-view CS determined from linear regression analyses were 0.98 and 0.99 respectively. The corresponding mean biases of MP measurement determined from Bland-Altman analyses were 2.7 and 1.2ml/min/100g. When only 328 projections were used for image reconstruction, CS was more accurate than FBP for MP measurement with respect to 984-view FBP. However, CS failed to generate MP maps comparable to those with 984-view FBP when only 246 projections were used for image reconstruction. DCE heart images reconstructed from one-third of a full projection set with CS were minimally affected by aliasing artifacts, leading to accurate MP measurements with the effective dose reduced to just 33% of conventional full-view FBP method. The proposed CS sparse-view image reconstruction method could facilitate the implementation of sparse-view dynamic acquisition for ultra-low dose CT MP imaging. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Analysis, tuning and comparison of two general sparse solvers for distributed memory computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amestoy, P.R.; Duff, I.S.; L'Excellent, J.-Y.

    2000-06-30

    We describe the work performed in the context of a Franco-Berkeley funded project between NERSC-LBNL located in Berkeley (USA) and CERFACS-ENSEEIHT located in Toulouse (France). We discuss both the tuning and performance analysis of two distributed memory sparse solvers (superlu from Berkeley and mumps from Toulouse) on the 512 processor Cray T3E from NERSC (Lawrence Berkeley National Laboratory). This project gave us the opportunity to improve the algorithms and add new features to the codes. We then quite extensively analyze and compare the two approaches on a set of large problems from real applications. We further explain the main differencesmore » in the behavior of the approaches on artificial regular grid problems. As a conclusion to this activity report, we mention a set of parallel sparse solvers on which this type of study should be extended.« less

  1. The application of nonlinear programming and collocation to optimal aeroassisted orbital transfers

    NASA Astrophysics Data System (ADS)

    Shi, Y. Y.; Nelson, R. L.; Young, D. H.; Gill, P. E.; Murray, W.; Saunders, M. A.

    1992-01-01

    Sequential quadratic programming (SQP) and collocation of the differential equations of motion were applied to optimal aeroassisted orbital transfers. The Optimal Trajectory by Implicit Simulation (OTIS) computer program codes with updated nonlinear programming code (NZSOL) were used as a testbed for the SQP nonlinear programming (NLP) algorithms. The state-of-the-art sparse SQP method is considered to be effective for solving large problems with a sparse matrix. Sparse optimizers are characterized in terms of memory requirements and computational efficiency. For the OTIS problems, less than 10 percent of the Jacobian matrix elements are nonzero. The SQP method encompasses two phases: finding an initial feasible point by minimizing the sum of infeasibilities and minimizing the quadratic objective function within the feasible region. The orbital transfer problem under consideration involves the transfer from a high energy orbit to a low energy orbit.

  2. Incoherent dictionary learning for reducing crosstalk noise in least-squares reverse time migration

    NASA Astrophysics Data System (ADS)

    Wu, Juan; Bai, Min

    2018-05-01

    We propose to apply a novel incoherent dictionary learning (IDL) algorithm for regularizing the least-squares inversion in seismic imaging. The IDL is proposed to overcome the drawback of traditional dictionary learning algorithm in losing partial texture information. Firstly, the noisy image is divided into overlapped image patches, and some random patches are extracted for dictionary learning. Then, we apply the IDL technology to minimize the coherency between atoms during dictionary learning. Finally, the sparse representation problem is solved by a sparse coding algorithm, and image is restored by those sparse coefficients. By reducing the correlation among atoms, it is possible to preserve most of the small-scale features in the image while removing much of the long-wavelength noise. The application of the IDL method to regularization of seismic images from least-squares reverse time migration shows successful performance.

  3. Multiple sclerosis lesion segmentation using dictionary learning and sparse coding.

    PubMed

    Weiss, Nick; Rueckert, Daniel; Rao, Anil

    2013-01-01

    The segmentation of lesions in the brain during the development of Multiple Sclerosis is part of the diagnostic assessment for this disease and gives information on its current severity. This laborious process is still carried out in a manual or semiautomatic fashion by clinicians because published automatic approaches have not been universal enough to be widely employed in clinical practice. Thus Multiple Sclerosis lesion segmentation remains an open problem. In this paper we present a new unsupervised approach addressing this problem with dictionary learning and sparse coding methods. We show its general applicability to the problem of lesion segmentation by evaluating our approach on synthetic and clinical image data and comparing it to state-of-the-art methods. Furthermore the potential of using dictionary learning and sparse coding for such segmentation tasks is investigated and various possibilities for further experiments are discussed.

  4. Sparse Covariance Matrix Estimation by DCA-Based Algorithms.

    PubMed

    Phan, Duy Nhat; Le Thi, Hoai An; Dinh, Tao Pham

    2017-11-01

    This letter proposes a novel approach using the [Formula: see text]-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the [Formula: see text] term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of [Formula: see text]-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods.

  5. MO-FG-204-08: Optimization-Based Image Reconstruction From Unevenly Distributed Sparse Projection Views

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xie, Huiqiao; Yang, Yi; Tang, Xiangyang

    2015-06-15

    Purpose: Optimization-based reconstruction has been proposed and investigated for reconstructing CT images from sparse views, as such the radiation dose can be substantially reduced while maintaining acceptable image quality. The investigation has so far focused on reconstruction from evenly distributed sparse views. Recognizing the clinical situations wherein only unevenly sparse views are available, e.g., image guided radiation therapy, CT perfusion and multi-cycle cardiovascular imaging, we investigate the performance of optimization-based image reconstruction from unevenly sparse projection views in this work. Methods: The investigation is carried out using the FORBILD and an anthropomorphic head phantoms. In the study, 82 views, whichmore » are evenly sorted out from a full (360°) axial CT scan consisting of 984 views, form sub-scan I. Another 82 views are sorted out in a similar manner to form sub-scan II. As such, a CT scan with sparse (164) views at 1:6 ratio are formed. By shifting the two sub-scans relatively in view angulation, a CT scan with unevenly distributed sparse (164) views at 1:6 ratio are formed. An optimization-based method is implemented to reconstruct images from the unevenly distributed views. By taking the FBP reconstruction from the full scan (984 views) as the reference, the root mean square (RMS) between the reference and the optimization-based reconstruction is used to evaluate the performance quantitatively. Results: In visual inspection, the optimization-based method outperforms the FBP substantially in the reconstruction from unevenly distributed, which are quantitatively verified by the RMS gauged globally and in ROIs in both the FORBILD and anthropomorphic head phantoms. The RMS increases with increasing severity in the uneven angular distribution, especially in the case of anthropomorphic head phantom. Conclusion: The optimization-based image reconstruction can save radiation dose up to 12-fold while providing acceptable image quality for advanced clinical applications wherein only unevenly distributed sparse views are available. Research Grants: W81XWH-12-1-0138 (DoD), Sinovision Technologies.« less

  6. Sparse modeling applied to patient identification for safety in medical physics applications

    NASA Astrophysics Data System (ADS)

    Lewkowitz, Stephanie

    Every scheduled treatment at a radiation therapy clinic involves a series of safety protocol to ensure the utmost patient care. Despite safety protocol, on a rare occasion an entirely preventable medical event, an accident, may occur. Delivering a treatment plan to the wrong patient is preventable, yet still is a clinically documented error. This research describes a computational method to identify patients with a novel machine learning technique to combat misadministration. The patient identification program stores face and fingerprint data for each patient. New, unlabeled data from those patients are categorized according to the library. The categorization of data by this face-fingerprint detector is accomplished with new machine learning algorithms based on Sparse Modeling that have already begun transforming the foundation of Computer Vision. Previous patient recognition software required special subroutines for faces and different tailored subroutines for fingerprints. In this research, the same exact model is used for both fingerprints and faces, without any additional subroutines and even without adjusting the two hyperparameters. Sparse modeling is a powerful tool, already shown utility in the areas of super-resolution, denoising, inpainting, demosaicing, and sub-nyquist sampling, i.e. compressed sensing. Sparse Modeling is possible because natural images are inherently sparse in some bases, due to their inherent structure. This research chooses datasets of face and fingerprint images to test the patient identification model. The model stores the images of each dataset as a basis (library). One image at a time is removed from the library, and is classified by a sparse code in terms of the remaining library. The Locally Competitive Algorithm, a truly neural inspired Artificial Neural Network, solves the computationally difficult task of finding the sparse code for the test image. The components of the sparse representation vector are summed by ℓ1 pooling, and correct patient identification is consistently achieved 100% over 1000 trials, when either the face data or fingerprint data are implemented as a classification basis. The algorithm gets 100% classification when faces and fingerprints are concatenated into multimodal datasets. This suggests that 100% patient identification will be achievable in the clinal setting.

  7. Learning Low-Rank Class-Specific Dictionary and Sparse Intra-Class Variant Dictionary for Face Recognition.

    PubMed

    Tang, Xin; Feng, Guo-Can; Li, Xiao-Xin; Cai, Jia-Xin

    2015-01-01

    Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases.

  8. Learning Low-Rank Class-Specific Dictionary and Sparse Intra-Class Variant Dictionary for Face Recognition

    PubMed Central

    Tang, Xin; Feng, Guo-can; Li, Xiao-xin; Cai, Jia-xin

    2015-01-01

    Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases. PMID:26571112

  9. Sparse Measurement Systems: Applications, Analysis, Algorithms and Design

    ERIC Educational Resources Information Center

    Narayanaswamy, Balakrishnan

    2011-01-01

    This thesis deals with "large-scale" detection problems that arise in many real world applications such as sensor networks, mapping with mobile robots and group testing for biological screening and drug discovery. These are problems where the values of a large number of inputs need to be inferred from noisy observations and where the…

  10. Progressive sample processing of band selection for hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Liu, Keng-Hao; Chien, Hung-Chang; Chen, Shih-Yu

    2017-10-01

    Band selection (BS) is one of the most important topics in hyperspectral image (HSI) processing. The objective of BS is to find a set of representative bands that can represent the whole image with lower inter-band redundancy. Many types of BS algorithms were proposed in the past. However, most of them can be carried on in an off-line manner. It means that they can only be implemented on the pre-collected data. Those off-line based methods are sometime useless for those applications that are timeliness, particular in disaster prevention and target detection. To tackle this issue, a new concept, called progressive sample processing (PSP), was proposed recently. The PSP is an "on-line" framework where the specific type of algorithm can process the currently collected data during the data transmission under band-interleavedby-sample/pixel (BIS/BIP) protocol. This paper proposes an online BS method that integrates a sparse-based BS into PSP framework, called PSP-BS. In PSP-BS, the BS can be carried out by updating BS result recursively pixel by pixel in the same way that a Kalman filter does for updating data information in a recursive fashion. The sparse regression is solved by orthogonal matching pursuit (OMP) algorithm, and the recursive equations of PSP-BS are derived by using matrix decomposition. The experiments conducted on a real hyperspectral image show that the PSP-BS can progressively output the BS status with very low computing time. The convergence of BS results during the transmission can be quickly achieved by using a rearranged pixel transmission sequence. This significant advantage allows BS to be implemented in a real time manner when the HSI data is transmitted pixel by pixel.

  11. Jaccard distance based weighted sparse representation for coarse-to-fine plant species recognition.

    PubMed

    Zhang, Shanwen; Wu, Xiaowei; You, Zhuhong

    2017-01-01

    Leaf based plant species recognition plays an important role in ecological protection, however its application to large and modern leaf databases has been a long-standing obstacle due to the computational cost and feasibility. Recognizing such limitations, we propose a Jaccard distance based sparse representation (JDSR) method which adopts a two-stage, coarse to fine strategy for plant species recognition. In the first stage, we use the Jaccard distance between the test sample and each training sample to coarsely determine the candidate classes of the test sample. The second stage includes a Jaccard distance based weighted sparse representation based classification(WSRC), which aims to approximately represent the test sample in the training space, and classify it by the approximation residuals. Since the training model of our JDSR method involves much fewer but more informative representatives, this method is expected to overcome the limitation of high computational and memory costs in traditional sparse representation based classification. Comparative experimental results on a public leaf image database demonstrate that the proposed method outperforms other existing feature extraction and SRC based plant recognition methods in terms of both accuracy and computational speed.

  12. Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

    We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less

  13. Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation.

    PubMed

    Jin, Wei; Gong, Fei; Zeng, Xingbin; Fu, Randi

    2016-12-16

    Automatic cloud detection and classification using satellite cloud imagery have various meteorological applications such as weather forecasting and climate monitoring. Cloud pattern analysis is one of the research hotspots recently. Since satellites sense the clouds remotely from space, and different cloud types often overlap and convert into each other, there must be some fuzziness and uncertainty in satellite cloud imagery. Satellite observation is susceptible to noises, while traditional cloud classification methods are sensitive to noises and outliers; it is hard for traditional cloud classification methods to achieve reliable results. To deal with these problems, a satellite cloud classification method using adaptive fuzzy sparse representation-based classification (AFSRC) is proposed. Firstly, by defining adaptive parameters related to attenuation rate and critical membership, an improved fuzzy membership is introduced to accommodate the fuzziness and uncertainty of satellite cloud imagery; secondly, by effective combination of the improved fuzzy membership function and sparse representation-based classification (SRC), atoms in training dictionary are optimized; finally, an adaptive fuzzy sparse representation classifier for cloud classification is proposed. Experiment results on FY-2G satellite cloud image show that, the proposed method not only improves the accuracy of cloud classification, but also has strong stability and adaptability with high computational efficiency.

  14. Quadrature demodulation based circuit implementation of pulse stream for ultrasonic signal FRI sparse sampling

    NASA Astrophysics Data System (ADS)

    Shoupeng, Song; Zhou, Jiang

    2017-03-01

    Converting ultrasonic signal to ultrasonic pulse stream is the key step of finite rate of innovation (FRI) sparse sampling. At present, ultrasonic pulse-stream-forming techniques are mainly based on digital algorithms. No hardware circuit that can achieve it has been reported. This paper proposes a new quadrature demodulation (QD) based circuit implementation method for forming an ultrasonic pulse stream. Elaborating on FRI sparse sampling theory, the process of ultrasonic signal is explained, followed by a discussion and analysis of ultrasonic pulse-stream-forming methods. In contrast to ultrasonic signal envelope extracting techniques, a quadrature demodulation method (QDM) is proposed. Simulation experiments were performed to determine its performance at various signal-to-noise ratios (SNRs). The circuit was then designed, with mixing module, oscillator, low pass filter (LPF), and root of square sum module. Finally, application experiments were carried out on pipeline sample ultrasonic flaw testing. The experimental results indicate that the QDM can accurately convert ultrasonic signal to ultrasonic pulse stream, and reverse the original signal information, such as pulse width, amplitude, and time of arrival. This technique lays the foundation for ultrasonic signal FRI sparse sampling directly with hardware circuitry.

  15. Power Enhancement in High Dimensional Cross-Sectional Tests

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Yao, Jiawei

    2016-01-01

    We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846

  16. A Comprehensive Validation Methodology for Sparse Experimental Data

    NASA Technical Reports Server (NTRS)

    Norman, Ryan B.; Blattnig, Steve R.

    2010-01-01

    A comprehensive program of verification and validation has been undertaken to assess the applicability of models to space radiation shielding applications and to track progress as models are developed over time. The models are placed under configuration control, and automated validation tests are used so that comparisons can readily be made as models are improved. Though direct comparisons between theoretical results and experimental data are desired for validation purposes, such comparisons are not always possible due to lack of data. In this work, two uncertainty metrics are introduced that are suitable for validating theoretical models against sparse experimental databases. The nuclear physics models, NUCFRG2 and QMSFRG, are compared to an experimental database consisting of over 3600 experimental cross sections to demonstrate the applicability of the metrics. A cumulative uncertainty metric is applied to the question of overall model accuracy, while a metric based on the median uncertainty is used to analyze the models from the perspective of model development by analyzing subsets of the model parameter space.

  17. Discovering governing equations from data by sparse identification of nonlinear dynamical systems

    PubMed Central

    Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan

    2016-01-01

    Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems in an appropriate basis. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized systems and systems that are time-varying or have external forcing. PMID:27035946

  18. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

    PubMed

    Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2016-04-12

    Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems in an appropriate basis. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized systems and systems that are time-varying or have external forcing.

  19. sparse-msrf:A package for sparse modeling and estimation of fossil-fuel CO2 emission fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-10-06

    The software is used to fit models of emission fields (e.g., fossil-fuel CO2 emissions) to sparse measurements of gaseous concentrations. Its primary aim is to provide an implementation and a demonstration for the algorithms and models developed in J. Ray, V. Yadav, A. M. Michalak, B. van Bloemen Waanders and S. A. McKenna, "A multiresolution spatial parameterization for the estimation of fossil-fuel carbon dioxide emissions via atmospheric inversions", accepted, Geoscientific Model Development, 2014. The software can be used to estimate emissions of non-reactive gases such as fossil-fuel CO2, methane etc. The software uses a proxy of the emission field beingmore » estimated (e.g., for fossil-fuel CO2, a population density map is a good proxy) to construct a wavelet model for the emission field. It then uses a shrinkage regression algorithm called Stagewise Orthogonal Matching Pursuit (StOMP) to fit the wavelet model to concentration measurements, using an atmospheric transport model to relate emission and concentration fields. Algorithmic novelties described in the paper above (1) ensure that the estimated emission fields are non-negative, (2) allow the use of guesses for emission fields to accelerate the estimation processes and (3) ensure that under/overestimates in the guesses do not skew the estimation.« less

  20. HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.

    PubMed

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2011-01-01

    The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.

  1. GPU-accelerated algorithms for compressed signals recovery with application to astronomical imagery deblurring

    NASA Astrophysics Data System (ADS)

    Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico

    2018-04-01

    Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.

  2. Incremental Structured Dictionary Learning for Video Sensor-Based Object Tracking

    PubMed Central

    Xue, Ming; Yang, Hua; Zheng, Shibao; Zhou, Yi; Yu, Zhenghua

    2014-01-01

    To tackle robust object tracking for video sensor-based applications, an online discriminative algorithm based on incremental discriminative structured dictionary learning (IDSDL-VT) is presented. In our framework, a discriminative dictionary combining both positive, negative and trivial patches is designed to sparsely represent the overlapped target patches. Then, a local update (LU) strategy is proposed for sparse coefficient learning. To formulate the training and classification process, a multiple linear classifier group based on a K-combined voting (KCV) function is proposed. As the dictionary evolves, the models are also trained to timely adapt the target appearance variation. Qualitative and quantitative evaluations on challenging image sequences compared with state-of-the-art algorithms demonstrate that the proposed tracking algorithm achieves a more favorable performance. We also illustrate its relay application in visual sensor networks. PMID:24549252

  3. Effective dimension reduction for sparse functional data

    PubMed Central

    YAO, F.; LEI, E.; WU, Y.

    2015-01-01

    Summary We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method. PMID:26566293

  4. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

    PubMed Central

    Lin, Wei; Feng, Rui; Li, Hongzhe

    2014-01-01

    In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642

  5. Sparse and redundant representations for inverse problems and recognition

    NASA Astrophysics Data System (ADS)

    Patel, Vishal M.

    Sparse and redundant representation of data enables the description of signals as linear combinations of a few atoms from a dictionary. In this dissertation, we study applications of sparse and redundant representations in inverse problems and object recognition. Furthermore, we propose two novel imaging modalities based on the recently introduced theory of Compressed Sensing (CS). This dissertation consists of four major parts. In the first part of the dissertation, we study a new type of deconvolution algorithm that is based on estimating the image from a shearlet decomposition. Shearlets provide a multi-directional and multi-scale decomposition that has been mathematically shown to represent distributed discontinuities such as edges better than traditional wavelets. We develop a deconvolution algorithm that allows for the approximation inversion operator to be controlled on a multi-scale and multi-directional basis. Furthermore, we develop a method for the automatic determination of the threshold values for the noise shrinkage for each scale and direction without explicit knowledge of the noise variance using a generalized cross validation method. In the second part of the dissertation, we study a reconstruction method that recovers highly undersampled images assumed to have a sparse representation in a gradient domain by using partial measurement samples that are collected in the Fourier domain. Our method makes use of a robust generalized Poisson solver that greatly aids in achieving a significantly improved performance over similar proposed methods. We will demonstrate by experiments that this new technique is more flexible to work with either random or restricted sampling scenarios better than its competitors. In the third part of the dissertation, we introduce a novel Synthetic Aperture Radar (SAR) imaging modality which can provide a high resolution map of the spatial distribution of targets and terrain using a significantly reduced number of needed transmitted and/or received electromagnetic waveforms. We demonstrate that this new imaging scheme, requires no new hardware components and allows the aperture to be compressed. Also, it presents many new applications and advantages which include strong resistance to countermesasures and interception, imaging much wider swaths and reduced on-board storage requirements. The last part of the dissertation deals with object recognition based on learning dictionaries for simultaneous sparse signal approximations and feature extraction. A dictionary is learned for each object class based on given training examples which minimize the representation error with a sparseness constraint. A novel test image is then projected onto the span of the atoms in each learned dictionary. The residual vectors along with the coefficients are then used for recognition. Applications to illumination robust face recognition and automatic target recognition are presented.

  6. Improved sparse decomposition based on a smoothed L0 norm using a Laplacian kernel to select features from fMRI data.

    PubMed

    Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying

    2015-04-30

    Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Deformable segmentation via sparse representation and dictionary learning.

    PubMed

    Zhang, Shaoting; Zhan, Yiqiang; Metaxas, Dimitris N

    2012-10-01

    "Shape" and "appearance", the two pillars of a deformable model, complement each other in object segmentation. In many medical imaging applications, while the low-level appearance information is weak or mis-leading, shape priors play a more important role to guide a correct segmentation, thanks to the strong shape characteristics of biological structures. Recently a novel shape prior modeling method has been proposed based on sparse learning theory. Instead of learning a generative shape model, shape priors are incorporated on-the-fly through the sparse shape composition (SSC). SSC is robust to non-Gaussian errors and still preserves individual shape characteristics even when such characteristics is not statistically significant. Although it seems straightforward to incorporate SSC into a deformable segmentation framework as shape priors, the large-scale sparse optimization of SSC has low runtime efficiency, which cannot satisfy clinical requirements. In this paper, we design two strategies to decrease the computational complexity of SSC, making a robust, accurate and efficient deformable segmentation system. (1) When the shape repository contains a large number of instances, which is often the case in 2D problems, K-SVD is used to learn a more compact but still informative shape dictionary. (2) If the derived shape instance has a large number of vertices, which often appears in 3D problems, an affinity propagation method is used to partition the surface into small sub-regions, on which the sparse shape composition is performed locally. Both strategies dramatically decrease the scale of the sparse optimization problem and hence speed up the algorithm. Our method is applied on a diverse set of biomedical image analysis problems. Compared to the original SSC, these two newly-proposed modules not only significant reduce the computational complexity, but also improve the overall accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.

  8. An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies.

    PubMed

    Remontet, L; Bossard, N; Belot, A; Estève, J

    2007-05-10

    Relative survival provides a measure of the proportion of patients dying from the disease under study without requiring the knowledge of the cause of death. We propose an overall strategy based on regression models to estimate the relative survival and model the effects of potential prognostic factors. The baseline hazard was modelled until 10 years follow-up using parametric continuous functions. Six models including cubic regression splines were considered and the Akaike Information Criterion was used to select the final model. This approach yielded smooth and reliable estimates of mortality hazard and allowed us to deal with sparse data taking into account all the available information. Splines were also used to model simultaneously non-linear effects of continuous covariates and time-dependent hazard ratios. This led to a graphical representation of the hazard ratio that can be useful for clinical interpretation. Estimates of these models were obtained by likelihood maximization. We showed that these estimates could be also obtained using standard algorithms for Poisson regression. Copyright 2006 John Wiley & Sons, Ltd.

  9. Screening and clustering of sparse regressions with finite non-Gaussian mixtures.

    PubMed

    Zhang, Jian

    2017-06-01

    This article proposes a method to address the problem that can arise when covariates in a regression setting are not Gaussian, which may give rise to approximately mixture-distributed errors, or when a true mixture of regressions produced the data. The method begins with non-Gaussian mixture-based marginal variable screening, followed by fitting a full but relatively smaller mixture regression model to the selected data with help of a new penalization scheme. Under certain regularity conditions, the new screening procedure is shown to possess a sure screening property even when the population is heterogeneous. We further prove that there exists an elbow point in the associated scree plot which results in a consistent estimator of the set of active covariates in the model. By simulations, we demonstrate that the new procedure can substantially improve the performance of the existing procedures in the content of variable screening and data clustering. By applying the proposed procedure to motif data analysis in molecular biology, we demonstrate that the new method holds promise in practice. © 2016, The International Biometric Society.

  10. Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

    PubMed

    Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

    2013-10-01

    Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.

  11. On the estimation of brain signal entropy from sparse neuroimaging data

    PubMed Central

    Grandy, Thomas H.; Garrett, Douglas D.; Schmiedek, Florian; Werkle-Bergner, Markus

    2016-01-01

    Multi-scale entropy (MSE) has been recently established as a promising tool for the analysis of the moment-to-moment variability of neural signals. Appealingly, MSE provides a measure of the predictability of neural operations across the multiple time scales on which the brain operates. An important limitation in the application of the MSE to some classes of neural signals is MSE’s apparent reliance on long time series. However, this sparse-data limitation in MSE computation could potentially be overcome via MSE estimation across shorter time series that are not necessarily acquired continuously (e.g., in fMRI block-designs). In the present study, using simulated, EEG, and fMRI data, we examined the dependence of the accuracy and precision of MSE estimates on the number of data points per segment and the total number of data segments. As hypothesized, MSE estimation across discontinuous segments was comparably accurate and precise, despite segment length. A key advance of our approach is that it allows the calculation of MSE scales not previously accessible from the native segment lengths. Consequently, our results may permit a far broader range of applications of MSE when gauging moment-to-moment dynamics in sparse and/or discontinuous neurophysiological data typical of many modern cognitive neuroscience study designs. PMID:27020961

  12. Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems.

    PubMed

    Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A

    2017-12-01

    The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction.

  13. Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems

    PubMed Central

    Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A.

    2017-01-01

    The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction. PMID:29376111

  14. Application of global datasets for hydrological modelling of a remote, snowmelt driven catchment in the Canadian Sub-Arctic

    NASA Astrophysics Data System (ADS)

    Casson, David; Werner, Micha; Weerts, Albrecht; Schellekens, Jaap; Solomatine, Dimitri

    2017-04-01

    Hydrological modelling in the Canadian Sub-Arctic is hindered by the limited spatial and temporal coverage of local meteorological data. Local watershed modelling often relies on data from a sparse network of meteorological stations with a rough density of 3 active stations per 100,000 km2. Global datasets hold great promise for application due to more comprehensive spatial and extended temporal coverage. A key objective of this study is to demonstrate the application of global datasets and data assimilation techniques for hydrological modelling of a data sparse, Sub-Arctic watershed. Application of available datasets and modelling techniques is currently limited in practice due to a lack of local capacity and understanding of available tools. Due to the importance of snow processes in the region, this study also aims to evaluate the performance of global SWE products for snowpack modelling. The Snare Watershed is a 13,300 km2 snowmelt driven sub-basin of the Mackenzie River Basin, Northwest Territories, Canada. The Snare watershed is data sparse in terms of meteorological data, but is well gauged with consistent discharge records since the late 1970s. End of winter snowpack surveys have been conducted every year from 1978-present. The application of global re-analysis datasets from the EU FP7 eartH2Observe project are investigated in this study. Precipitation data are taken from Multi-Source Weighted-Ensemble Precipitation (MSWEP) and temperature data from Watch Forcing Data applied to European Reanalysis (ERA)-Interim data (WFDEI). GlobSnow-2 is a global Snow Water Equivalent (SWE) measurement product funded by the European Space Agency (ESA) and is also evaluated over the local watershed. Downscaled precipitation, temperature and potential evaporation datasets are used as forcing data in a distributed version of the HBV model implemented in the WFLOW framework. Results demonstrate the successful application of global datasets in local watershed modelling, but that validation of actual frozen precipitation and snowpack conditions is very difficult. The distributed hydrological model shows good streamflow simulation performance based on statistical model evaluation techniques. Results are also promising for inter-annual variability, spring snowmelt onset and time to peak flows. It is expected that data assimilation of stream flow using an Ensemble Kalman Filter will further improve model performance. This study shows that global re-analysis datasets hold great potential for understanding the hydrology and snowpack dynamics of the expansive and data sparse sub-Arctic. However, global SWE products will require further validation and algorithm improvements, particularly over boreal forest and lake-rich regions.

  15. Iterative approach of dual regression with a sparse prior enhances the performance of independent component analysis for group functional magnetic resonance imaging (fMRI) data.

    PubMed

    Kim, Yong-Hwan; Kim, Junghoe; Lee, Jong-Hwan

    2012-12-01

    This study proposes an iterative dual-regression (DR) approach with sparse prior regularization to better estimate an individual's neuronal activation using the results of an independent component analysis (ICA) method applied to a temporally concatenated group of functional magnetic resonance imaging (fMRI) data (i.e., Tc-GICA method). An ordinary DR approach estimates the spatial patterns (SPs) of neuronal activation and corresponding time courses (TCs) specific to each individual's fMRI data with two steps involving least-squares (LS) solutions. Our proposed approach employs iterative LS solutions to refine both the individual SPs and TCs with an additional a priori assumption of sparseness in the SPs (i.e., minimally overlapping SPs) based on L(1)-norm minimization. To quantitatively evaluate the performance of this approach, semi-artificial fMRI data were created from resting-state fMRI data with the following considerations: (1) an artificially designed spatial layout of neuronal activation patterns with varying overlap sizes across subjects and (2) a BOLD time series (TS) with variable parameters such as onset time, duration, and maximum BOLD levels. To systematically control the spatial layout variability of neuronal activation patterns across the "subjects" (n=12), the degree of spatial overlap across all subjects was varied from a minimum of 1 voxel (i.e., 0.5-voxel cubic radius) to a maximum of 81 voxels (i.e., 2.5-voxel radius) across the task-related SPs with a size of 100 voxels for both the block-based and event-related task paradigms. In addition, several levels of maximum percentage BOLD intensity (i.e., 0.5, 1.0, 2.0, and 3.0%) were used for each degree of spatial overlap size. From the results, the estimated individual SPs of neuronal activation obtained from the proposed iterative DR approach with a sparse prior showed an enhanced true positive rate and reduced false positive rate compared to the ordinary DR approach. The estimated TCs of the task-related SPs from our proposed approach showed greater temporal correlation coefficients with a reference hemodynamic response function than those of the ordinary DR approach. Moreover, the efficacy of the proposed DR approach was also successfully demonstrated by the results of real fMRI data acquired from left-/right-hand clenching tasks in both block-based and event-related task paradigms. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. SparseCT: interrupted-beam acquisition and sparse reconstruction for radiation dose reduction

    NASA Astrophysics Data System (ADS)

    Koesters, Thomas; Knoll, Florian; Sodickson, Aaron; Sodickson, Daniel K.; Otazo, Ricardo

    2017-03-01

    State-of-the-art low-dose CT methods reduce the x-ray tube current and use iterative reconstruction methods to denoise the resulting images. However, due to compromises between denoising and image quality, only moderate dose reductions up to 30-40% are accepted in clinical practice. An alternative approach is to reduce the number of x-ray projections and use compressed sensing to reconstruct the full-tube-current undersampled data. This idea was recognized in the early days of compressed sensing and proposals for CT dose reduction appeared soon afterwards. However, no practical means of undersampling has yet been demonstrated in the challenging environment of a rapidly rotating CT gantry. In this work, we propose a moving multislit collimator as a practical incoherent undersampling scheme for compressed sensing CT and evaluate its application for radiation dose reduction. The proposed collimator is composed of narrow slits and moves linearly along the slice dimension (z), to interrupt the incident beam in different slices for each x-ray tube angle (θ). The reduced projection dataset is then reconstructed using a sparse approach, where 3D image gradients are employed to enforce sparsity. The effects of the collimator slits on the beam profile were measured and represented as a continuous slice profile. SparseCT was tested using retrospective undersampling and compared against commercial current-reduction techniques on phantoms and in vivo studies. Initial results suggest that SparseCT may enable higher performance than current-reduction, particularly for high dose reduction factors.

  17. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

    PubMed

    Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence

    2010-11-09

    Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  18. Harnessing data structure for recovery of randomly missing structural vibration responses time history: Sparse representation versus low-rank structure

    NASA Astrophysics Data System (ADS)

    Yang, Yongchao; Nagarajaiah, Satish

    2016-06-01

    Randomly missing data of structural vibration responses time history often occurs in structural dynamics and health monitoring. For example, structural vibration responses are often corrupted by outliers or erroneous measurements due to sensor malfunction; in wireless sensing platforms, data loss during wireless communication is a common issue. Besides, to alleviate the wireless data sampling or communication burden, certain accounts of data are often discarded during sampling or before transmission. In these and other applications, recovery of the randomly missing structural vibration responses from the available, incomplete data, is essential for system identification and structural health monitoring; it is an ill-posed inverse problem, however. This paper explicitly harnesses the data structure itself-of the structural vibration responses-to address this (inverse) problem. What is relevant is an empirical, but often practically true, observation, that is, typically there are only few modes active in the structural vibration responses; hence a sparse representation (in frequency domain) of the single-channel data vector, or, a low-rank structure (by singular value decomposition) of the multi-channel data matrix. Exploiting such prior knowledge of data structure (intra-channel sparse or inter-channel low-rank), the new theories of ℓ1-minimization sparse recovery and nuclear-norm-minimization low-rank matrix completion enable recovery of the randomly missing or corrupted structural vibration response data. The performance of these two alternatives, in terms of recovery accuracy and computational time under different data missing rates, is investigated on a few structural vibration response data sets-the seismic responses of the super high-rise Canton Tower and the structural health monitoring accelerations of a real large-scale cable-stayed bridge. Encouraging results are obtained and the applicability and limitation of the presented methods are discussed.

  19. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  20. Estimation of selected flow and water-quality characteristics of Alaskan streams

    USGS Publications Warehouse

    Parks, Bruce; Madison, R.J.

    1985-01-01

    Although hydrologic data are either sparse or nonexistent for large areas of Alaska, the drainage area, area of lakes, glacier and forest cover, and average precipitation in a hydrologic basin of interest can be measured or estimated from existing maps. Application of multiple linear regression techniques indicates that statistically significant correlations exist between properties of basins determined from maps and measured streamflow characteristics. This suggests that corresponding characteristics of ungaged basins can be estimated. Streamflow frequency characteristics can be estimated from regional equations developed for southeast, south-central and Yukon regions. Statewide or modified regional equations must be used, however, for the southwest, northwest, and Arctic Slope regions where there is a paucity of data. Equations developed from basin characteristics are given to estimate suspended-sediment values for glacial streams and, with less reliability, for nonglacial streams. Equations developed from available specific conductance data are given to estimate concentrations of major dissolved inorganic constituents. Suggestions are made for expanding the existing data base and thus improving the ability to estimate hydrologic characteristics for Alaskan streams. (USGS)

  1. Data Assimilation and Propagation of Uncertainty in Multiscale Cardiovascular Simulation

    NASA Astrophysics Data System (ADS)

    Schiavazzi, Daniele; Marsden, Alison

    2015-11-01

    Cardiovascular modeling is the application of computational tools to predict hemodynamics. State-of-the-art techniques couple a 3D incompressible Navier-Stokes solver with a boundary circulation model and can predict local and peripheral hemodynamics, analyze the post-operative performance of surgical designs and complement clinical data collection minimizing invasive and risky measurement practices. The ability of these tools to make useful predictions is directly related to their accuracy in representing measured physiologies. Tuning of model parameters is therefore a topic of paramount importance and should include clinical data uncertainty, revealing how this uncertainty will affect the predictions. We propose a fully Bayesian, multi-level approach to data assimilation of uncertain clinical data in multiscale circulation models. To reduce the computational cost, we use a stable, condensed approximation of the 3D model build by linear sparse regression of the pressure/flow rate relationship at the outlets. Finally, we consider the problem of non-invasively propagating the uncertainty in model parameters to the resulting hemodynamics and compare Monte Carlo simulation with Stochastic Collocation approaches based on Polynomial or Multi-resolution Chaos expansions.

  2. Computational tools for exact conditional logistic regression.

    PubMed

    Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P

    Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.

  3. Discriminative least squares regression for multiclass classification and feature selection.

    PubMed

    Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui

    2012-11-01

    This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

  4. A soil map of a large watershed in China: applying digital soil mapping in a data sparse region

    NASA Astrophysics Data System (ADS)

    Barthold, F.; Blank, B.; Wiesmeier, M.; Breuer, L.; Frede, H.-G.

    2009-04-01

    Prediction of soil classes in data sparse regions is a major research challenge. With the advent of machine learning the possibilities to spatially predict soil classes have increased tremendously and given birth to new possibilities in soil mapping. Digital soil mapping is a research field that has been established during the last decades and has been accepted widely. We now need to develop tools to reduce the uncertainty in soil predictions. This is especially challenging in data sparse regions. One approach to do this is to implement soil taxonomic distance as a classification error criterion in classification and regression trees (CART) as suggested by Minasny et al. (Geoderma 142 (2007) 285-293). This approach assumes that the classification error should be larger between soils that are more dissimilar, i.e. differ in a larger number of soil properties, and smaller between more similar soils. Our study area is the Xilin River Basin, which is located in central Inner Mongolia in China. It is characterized by semi arid climate conditions and is representative for the natural occurring steppe ecosystem. The study area comprises 3600 km2. We applied a random, stratified sampling design after McKenzie and Ryan (Geoderma 89 (1999) 67-94) with landuse and topography as stratifying variables. We defined 10 sampling classes, from each class 14 replicates were randomly drawn and sampled. The dataset was split into 100 soil profiles for training and 40 soil profiles for validation. We then applied classification and regression trees (CART) to quantify the relationships between soil classes and environmental covariates. The classification tree explained 75.5% of the variance with land use and geology as most important predictor variables. Among the 8 soil classes that we predicted, the Kastanozems cover most of the area. They are predominantly found in steppe areas. However, even some of the soils at sand dune sites, which were thought to show only little soil formation, can be classified as Kastanozems. Besides the Kastanozems, Regosols are most common at the sand dune sites as well as at sites that are defined as bare soil which are characterized by little or no vegetation. Gleysols are mostly found at sites in the vicinity of the Xilin river, which are connected to the groundwater. They can also be found in small valleys or depressions where sub-surface waters from neighboring areas collect. The richest soils are found in mountain meadow areas. Pedogenetic conditions here are most favorable and lead to the formation of Chernozems with deep humic Ah horizons. Other soil types that occur in the study area are Arenosols, Calcisols, Cambisol and Phaeozems. In addition, soil taxonomic distance is implemented into the decision tree procedure as a measure of classification error. The results of incorporating taxonomic distance as a loss function in the decision tree will be compared with the standard application of the decision tree.

  5. Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

    PubMed

    Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2015-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed method give rise to significantly improved sensitivity scores, suggesting that the learned features allow prediction with high accuracy of disease status in those who are diagnosed with melancholic depression. To our best knowledge, this is the first work that applies sparse coding to deal with high feature correlations and missing values, which are common challenges in many biomedical applications. The proposed method can be readily adapted to other biomedical applications involving incomplete and high-dimensional data.

  6. HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2012-01-01

    The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. PMID:22661790

  7. Discrete shearlet transform: faithful digitization concept and its applications

    NASA Astrophysics Data System (ADS)

    Lim, Wang-Q.

    2011-09-01

    Over the past years, various representation systems which sparsely approximate functions governed by anisotropic features such as edges in images have been proposed. Alongside the theoretical development of these systems, algorithmic realizations of the associated transforms were provided. However, one of the most common short-comings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of the continuum theory. Shearlets were introduced as means to sparsely encode anisotropic singularities of multivariate data while providing a unified treatment of the continuous and digital realm. In this paper, we introduce a discrete framework which allows a faithful digitization of the continuum domain shearlet transform based on compactly supported shearlets. Finally, we show numerical experiments demonstrating the potential of the discrete shearlet transform in several image processing applications.

  8. Nonparametric estimation of stochastic differential equations with sparse Gaussian processes.

    PubMed

    García, Constantino A; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G

    2017-08-01

    The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.

  9. Intelligent Design of Metal Oxide Gas Sensor Arrays Using Reciprocal Kernel Support Vector Regression

    NASA Astrophysics Data System (ADS)

    Dougherty, Andrew W.

    Metal oxides are a staple of the sensor industry. The combination of their sensitivity to a number of gases, and the electrical nature of their sensing mechanism, make the particularly attractive in solid state devices. The high temperature stability of the ceramic material also make them ideal for detecting combustion byproducts where exhaust temperatures can be high. However, problems do exist with metal oxide sensors. They are not very selective as they all tend to be sensitive to a number of reduction and oxidation reactions on the oxide's surface. This makes sensors with large numbers of sensors interesting to study as a method for introducing orthogonality to the system. Also, the sensors tend to suffer from long term drift for a number of reasons. In this thesis I will develop a system for intelligently modeling metal oxide sensors and determining their suitability for use in large arrays designed to analyze exhaust gas streams. It will introduce prior knowledge of the metal oxide sensors' response mechanisms in order to produce a response function for each sensor from sparse training data. The system will use the same technique to model and remove any long term drift from the sensor response. It will also provide an efficient means for determining the orthogonality of the sensor to determine whether they are useful in gas sensing arrays. The system is based on least squares support vector regression using the reciprocal kernel. The reciprocal kernel is introduced along with a method of optimizing the free parameters of the reciprocal kernel support vector machine. The reciprocal kernel is shown to be simpler and to perform better than an earlier kernel, the modified reciprocal kernel. Least squares support vector regression is chosen as it uses all of the training points and an emphasis was placed throughout this research for extracting the maximum information from very sparse data. The reciprocal kernel is shown to be effective in modeling the sensor responses in the time, gas and temperature domains, and the dual representation of the support vector regression solution is shown to provide insight into the sensor's sensitivity and potential orthogonality. Finally, the dual weights of the support vector regression solution to the sensor's response are suggested as a fitness function for a genetic algorithm, or some other method for efficiently searching large parameter spaces.

  10. Stepwise group sparse regression (SGSR): gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors.

    PubMed

    Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin

    2015-01-01

    Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.

  11. Polynomial meta-models with canonical low-rank approximations: Numerical insights and comparison to sparse polynomial chaos expansions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Konakli, Katerina, E-mail: konakli@ibk.baug.ethz.ch; Sudret, Bruno

    2016-09-15

    The growing need for uncertainty analysis of complex computational models has led to an expanding use of meta-models across engineering and sciences. The efficiency of meta-modeling techniques relies on their ability to provide statistically-equivalent analytical representations based on relatively few evaluations of the original model. Polynomial chaos expansions (PCE) have proven a powerful tool for developing meta-models in a wide range of applications; the key idea thereof is to expand the model response onto a basis made of multivariate polynomials obtained as tensor products of appropriate univariate polynomials. The classical PCE approach nevertheless faces the “curse of dimensionality”, namely themore » exponential increase of the basis size with increasing input dimension. To address this limitation, the sparse PCE technique has been proposed, in which the expansion is carried out on only a few relevant basis terms that are automatically selected by a suitable algorithm. An alternative for developing meta-models with polynomial functions in high-dimensional problems is offered by the newly emerged low-rank approximations (LRA) approach. By exploiting the tensor–product structure of the multivariate basis, LRA can provide polynomial representations in highly compressed formats. Through extensive numerical investigations, we herein first shed light on issues relating to the construction of canonical LRA with a particular greedy algorithm involving a sequential updating of the polynomial coefficients along separate dimensions. Specifically, we examine the selection of optimal rank, stopping criteria in the updating of the polynomial coefficients and error estimation. In the sequel, we confront canonical LRA to sparse PCE in structural-mechanics and heat-conduction applications based on finite-element solutions. Canonical LRA exhibit smaller errors than sparse PCE in cases when the number of available model evaluations is small with respect to the input dimension, a situation that is often encountered in real-life problems. By introducing the conditional generalization error, we further demonstrate that canonical LRA tend to outperform sparse PCE in the prediction of extreme model responses, which is critical in reliability analysis.« less

  12. No-reference quality assessment based on visual perception

    NASA Astrophysics Data System (ADS)

    Li, Junshan; Yang, Yawei; Hu, Shuangyan; Zhang, Jiao

    2014-11-01

    The visual quality assessment of images/videos is an ongoing hot research topic, which has become more and more important for numerous image and video processing applications with the rapid development of digital imaging and communication technologies. The goal of image quality assessment (IQA) algorithms is to automatically assess the quality of images/videos in agreement with human quality judgments. Up to now, two kinds of models have been used for IQA, namely full-reference (FR) and no-reference (NR) models. For FR models, IQA algorithms interpret image quality as fidelity or similarity with a perfect image in some perceptual space. However, the reference image is not available in many practical applications, and a NR IQA approach is desired. Considering natural vision as optimized by the millions of years of evolutionary pressure, many methods attempt to achieve consistency in quality prediction by modeling salient physiological and psychological features of the human visual system (HVS). To reach this goal, researchers try to simulate HVS with image sparsity coding and supervised machine learning, which are two main features of HVS. A typical HVS captures the scenes by sparsity coding, and uses experienced knowledge to apperceive objects. In this paper, we propose a novel IQA approach based on visual perception. Firstly, a standard model of HVS is studied and analyzed, and the sparse representation of image is accomplished with the model; and then, the mapping correlation between sparse codes and subjective quality scores is trained with the regression technique of least squaresupport vector machine (LS-SVM), which gains the regressor that can predict the image quality; the visual metric of image is predicted with the trained regressor at last. We validate the performance of proposed approach on Laboratory for Image and Video Engineering (LIVE) database, the specific contents of the type of distortions present in the database are: 227 images of JPEG2000, 233 images of JPEG, 174 images of White Noise, 174 images of Gaussian Blur, 174 images of Fast Fading. The database includes subjective differential mean opinion score (DMOS) for each image. The experimental results show that the proposed approach not only can assess many kinds of distorted images quality, but also exhibits a superior accuracy and monotonicity.

  13. Sparse polynomial space approach to dissipative quantum systems: application to the sub-ohmic spin-boson model.

    PubMed

    Alvermann, A; Fehske, H

    2009-04-17

    We propose a general numerical approach to open quantum systems with a coupling to bath degrees of freedom. The technique combines the methodology of polynomial expansions of spectral functions with the sparse grid concept from interpolation theory. Thereby we construct a Hilbert space of moderate dimension to represent the bath degrees of freedom, which allows us to perform highly accurate and efficient calculations of static, spectral, and dynamic quantities using standard exact diagonalization algorithms. The strength of the approach is demonstrated for the phase transition, critical behavior, and dissipative spin dynamics in the spin-boson model.

  14. Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation

    PubMed Central

    Jin, Wei; Gong, Fei; Zeng, Xingbin; Fu, Randi

    2016-01-01

    Automatic cloud detection and classification using satellite cloud imagery have various meteorological applications such as weather forecasting and climate monitoring. Cloud pattern analysis is one of the research hotspots recently. Since satellites sense the clouds remotely from space, and different cloud types often overlap and convert into each other, there must be some fuzziness and uncertainty in satellite cloud imagery. Satellite observation is susceptible to noises, while traditional cloud classification methods are sensitive to noises and outliers; it is hard for traditional cloud classification methods to achieve reliable results. To deal with these problems, a satellite cloud classification method using adaptive fuzzy sparse representation-based classification (AFSRC) is proposed. Firstly, by defining adaptive parameters related to attenuation rate and critical membership, an improved fuzzy membership is introduced to accommodate the fuzziness and uncertainty of satellite cloud imagery; secondly, by effective combination of the improved fuzzy membership function and sparse representation-based classification (SRC), atoms in training dictionary are optimized; finally, an adaptive fuzzy sparse representation classifier for cloud classification is proposed. Experiment results on FY-2G satellite cloud image show that, the proposed method not only improves the accuracy of cloud classification, but also has strong stability and adaptability with high computational efficiency. PMID:27999261

  15. Adaptive sparse grid approach for the efficient simulation of pulsed eddy current testing inspections

    NASA Astrophysics Data System (ADS)

    Miorelli, Roberto; Reboud, Christophe

    2018-04-01

    Pulsed Eddy Current Testing (PECT) is a popular NonDestructive Testing (NDT) technique for some applications like corrosion monitoring in the oil and gas industry, or rivet inspection in the aeronautic area. Its particularity is to use a transient excitation, which allows to retrieve more information from the piece than conventional harmonic ECT, in a simpler and cheaper way than multi-frequency ECT setups. Efficient modeling tools prove, as usual, very useful to optimize experimental sensors and devices or evaluate their performance, for instance. This paper proposes an efficient simulation of PECT signals based on standard time harmonic solvers and use of an Adaptive Sparse Grid (ASG) algorithm. An adaptive sampling of the ECT signal spectrum is performed with this algorithm, then the complete spectrum is interpolated from this sparse representation and PECT signals are finally synthesized by means of inverse Fourier transform. Simulation results corresponding to existing industrial configurations are presented and the performance of the strategy is discussed by comparison to reference results.

  16. Sparse approximation of currents for statistics on curves and surfaces.

    PubMed

    Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

    2008-01-01

    Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.

  17. Objective sea level pressure analysis for sparse data areas

    NASA Technical Reports Server (NTRS)

    Druyan, L. M.

    1972-01-01

    A computer procedure was used to analyze the pressure distribution over the North Pacific Ocean for eleven synoptic times in February, 1967. Independent knowledge of the central pressures of lows is shown to reduce the analysis errors for very sparse data coverage. The application of planned remote sensing of sea-level wind speeds is shown to make a significant contribution to the quality of the analysis especially in the high gradient mid-latitudes and for sparse coverage of conventional observations (such as over Southern Hemisphere oceans). Uniform distribution of the available observations of sea-level pressure and wind velocity yields results far superior to those derived from a random distribution. A generalization of the results indicates that the average lower limit for analysis errors is between 2 and 2.5 mb based on the perfect specification of the magnitude of the sea-level pressure gradient from a known verification analysis. A less than perfect specification will derive from wind-pressure relationships applied to satellite observed wind speeds.

  18. A Fast and Accurate Sparse Continuous Signal Reconstruction by Homotopy DCD with Non-Convex Regularization

    PubMed Central

    Wang, Tianyun; Lu, Xinfei; Yu, Xiaofei; Xi, Zhendong; Chen, Weidong

    2014-01-01

    In recent years, various applications regarding sparse continuous signal recovery such as source localization, radar imaging, communication channel estimation, etc., have been addressed from the perspective of compressive sensing (CS) theory. However, there are two major defects that need to be tackled when considering any practical utilization. The first issue is off-grid problem caused by the basis mismatch between arbitrary located unknowns and the pre-specified dictionary, which would make conventional CS reconstruction methods degrade considerably. The second important issue is the urgent demand for low-complexity algorithms, especially when faced with the requirement of real-time implementation. In this paper, to deal with these two problems, we have presented three fast and accurate sparse reconstruction algorithms, termed as HR-DCD, Hlog-DCD and Hlp-DCD, which are based on homotopy, dichotomous coordinate descent (DCD) iterations and non-convex regularizations, by combining with the grid refinement technique. Experimental results are provided to demonstrate the effectiveness of the proposed algorithms and related analysis. PMID:24675758

  19. Fractionation of a tumor-initiating UV dose introduces DNA damage-retaining cells in hairless mouse skin and renders subsequent TPA-promoted tumors non-regressing

    PubMed Central

    van de Glind, Gerline; Rebel, Heggert; van Kempen, Marika; Tensen, Kees; de Gruijl, Frank

    2016-01-01

    Sunburns and especially sub-sunburn chronic UV exposure are associated with increased risk of squamous cell carcinomas (SCCs). Here we focus on a possible difference in tumor initiation from a single severe-sunburn dose (on day 1, 21 hairless mice) and from an equal dose fractionated into very low sub-sunburn doses not causing any (growth-promoting) epidermal hyperplasia (40 days daily exposure, n=20). From day 47 all mice received 12-O-Tetradecanoylphorbol-13-acetate (TPA) applications (2x/wk) for 20 weeks to promote tumor development within the lifetime of the animals. After the sub-sunburn regimen sparse DNA damage-retaining basal cells (quiescent stem cells, QSCs) remained in the non-hyperplastic epidermis. These cells were forced to divide by TPA. After discontinuation of TPA tumors regressed and disappeared in the ‘sunburn group’ but persisted and grew in the ‘sub-sunburn group’ (0.06 vs 2.50 SCCs and precursors ≥4mm/mouse after 280 days, p=0.03). As the tumors carried no mutations in p53, H/K/N-Ras and Notch1/2, these ‘usual suspects' were not involved in the UV-driven tumor initiation. Although we could not selectively eliminate QSCs (unknown phenotype) to establish causality, our data suggest that forcing specifically DNA damage-retaining QSCs to divide – with high mutagenic risk - gives rise to persisting (mainly ‘in situ’) skin carcinomas. PMID:26797757

  20. Layout Study and Application of Mobile App Recommendation Approach Based On Spark Streaming Framework

    NASA Astrophysics Data System (ADS)

    Wang, H. T.; Chen, T. T.; Yan, C.; Pan, H.

    2018-05-01

    For App recommended areas of mobile phone software, made while using conduct App application recommended combined weighted Slope One algorithm collaborative filtering algorithm items based on further improvement of the traditional collaborative filtering algorithm in cold start, data matrix sparseness and other issues, will recommend Spark stasis parallel algorithm platform, the introduction of real-time streaming streaming real-time computing framework to improve real-time software applications recommended.

  1. Accelerated Simulation of Kinetic Transport Using Variational Principles and Sparsity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Caflisch, Russel

    This project is centered on the development and application of techniques of sparsity and compressed sensing for variational principles, PDEs and physics problems, in particular for kinetic transport. This included derivation of sparse modes for elliptic and parabolic problems coming from variational principles. The research results of this project are on methods for sparsity in differential equations and their applications and on application of sparsity ideas to kinetic transport of plasmas.

  2. A Comparative Study of Multi-material Data Structures for Computational Physics Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garimella, Rao Veerabhadra; Robey, Robert W.

    The data structures used to represent the multi-material state of a computational physics application can have a drastic impact on the performance of the application. We look at efficient data structures for sparse applications where there may be many materials, but only one or few in most computational cells. We develop simple performance models for use in selecting possible data structures and programming patterns. We verify the analytic models of performance through a small test program of the representative cases.

  3. A sparse grid based method for generative dimensionality reduction of high-dimensional data

    NASA Astrophysics Data System (ADS)

    Bohn, Bastian; Garcke, Jochen; Griebel, Michael

    2016-03-01

    Generative dimensionality reduction methods play an important role in machine learning applications because they construct an explicit mapping from a low-dimensional space to the high-dimensional data space. We discuss a general framework to describe generative dimensionality reduction methods, where the main focus lies on a regularized principal manifold learning variant. Since most generative dimensionality reduction algorithms exploit the representer theorem for reproducing kernel Hilbert spaces, their computational costs grow at least quadratically in the number n of data. Instead, we introduce a grid-based discretization approach which automatically scales just linearly in n. To circumvent the curse of dimensionality of full tensor product grids, we use the concept of sparse grids. Furthermore, in real-world applications, some embedding directions are usually more important than others and it is reasonable to refine the underlying discretization space only in these directions. To this end, we employ a dimension-adaptive algorithm which is based on the ANOVA (analysis of variance) decomposition of a function. In particular, the reconstruction error is used to measure the quality of an embedding. As an application, the study of large simulation data from an engineering application in the automotive industry (car crash simulation) is performed.

  4. Sparse Multivariate Autoregressive Modeling for Mild Cognitive Impairment Classification

    PubMed Central

    Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen

    2014-01-01

    Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach. PMID:24595922

  5. Deformable MR Prostate Segmentation via Deep Feature Learning and Sparse Patch Matching

    PubMed Central

    Guo, Yanrong; Gao, Yaozong

    2016-01-01

    Automatic and reliable segmentation of the prostate is an important but difficult task for various clinical applications such as prostate cancer radiotherapy. The main challenges for accurate MR prostate localization lie in two aspects: (1) inhomogeneous and inconsistent appearance around prostate boundary, and (2) the large shape variation across different patients. To tackle these two problems, we propose a new deformable MR prostate segmentation method by unifying deep feature learning with the sparse patch matching. First, instead of directly using handcrafted features, we propose to learn the latent feature representation from prostate MR images by the stacked sparse auto-encoder (SSAE). Since the deep learning algorithm learns the feature hierarchy from the data, the learned features are often more concise and effective than the handcrafted features in describing the underlying data. To improve the discriminability of learned features, we further refine the feature representation in a supervised fashion. Second, based on the learned features, a sparse patch matching method is proposed to infer a prostate likelihood map by transferring the prostate labels from multiple atlases to the new prostate MR image. Finally, a deformable segmentation is used to integrate a sparse shape model with the prostate likelihood map for achieving the final segmentation. The proposed method has been extensively evaluated on the dataset that contains 66 T2-wighted prostate MR images. Experimental results show that the deep-learned features are more effective than the handcrafted features in guiding MR prostate segmentation. Moreover, our method shows superior performance than other state-of-the-art segmentation methods. PMID:26685226

  6. Shape prior modeling using sparse representation and online dictionary learning.

    PubMed

    Zhang, Shaoting; Zhan, Yiqiang; Zhou, Yan; Uzunbas, Mustafa; Metaxas, Dimitris N

    2012-01-01

    The recently proposed sparse shape composition (SSC) opens a new avenue for shape prior modeling. Instead of assuming any parametric model of shape statistics, SSC incorporates shape priors on-the-fly by approximating a shape instance (usually derived from appearance cues) by a sparse combination of shapes in a training repository. Theoretically, one can increase the modeling capability of SSC by including as many training shapes in the repository. However, this strategy confronts two limitations in practice. First, since SSC involves an iterative sparse optimization at run-time, the more shape instances contained in the repository, the less run-time efficiency SSC has. Therefore, a compact and informative shape dictionary is preferred to a large shape repository. Second, in medical imaging applications, training shapes seldom come in one batch. It is very time consuming and sometimes infeasible to reconstruct the shape dictionary every time new training shapes appear. In this paper, we propose an online learning method to address these two limitations. Our method starts from constructing an initial shape dictionary using the K-SVD algorithm. When new training shapes come, instead of re-constructing the dictionary from the ground up, we update the existing one using a block-coordinates descent approach. Using the dynamically updated dictionary, sparse shape composition can be gracefully scaled up to model shape priors from a large number of training shapes without sacrificing run-time efficiency. Our method is validated on lung localization in X-Ray and cardiac segmentation in MRI time series. Compared to the original SSC, it shows comparable performance while being significantly more efficient.

  7. Review: The state-of-art of sparse channel models and their applicability to performance assessment of radioactive waste repositories in fractured crystalline formations

    NASA Astrophysics Data System (ADS)

    Figueiredo, Bruno; Tsang, Chin-Fu; Niemi, Auli; Lindgren, Georg

    2016-11-01

    Laboratory and field experiments done on fractured rock show that flow and solute transport often occur along flow channels. `Sparse channels' refers to the case where these channels are characterised by flow in long flow paths separated from each other by large spacings relative to the size of flow domain. A literature study is presented that brings together information useful to assess whether a sparse-channel network concept is an appropriate representation of the flow system in tight fractured rock of low transmissivity, such as that around a nuclear waste repository in deep crystalline rocks. A number of observations are made in this review. First, conventional fracture network models may lead to inaccurate results for flow and solute transport in tight fractured rocks. Secondly, a flow dimension of 1, as determined by the analysis of pressure data in well testing, may be indicative of channelised flow, but such interpretation is not unique or definitive. Thirdly, in sparse channels, the percolation may be more influenced by the fracture shape than the fracture size and orientation but further studies are needed. Fourthly, the migration of radionuclides from a waste canister in a repository to the biosphere may be strongly influenced by the type of model used (e.g. discrete fracture network, channel model). Fifthly, the determination of appropriateness of representing an in situ flow system by a sparse-channel network model needs parameters usually neglected in site characterisation, such as the density of channels or fracture intersections.

  8. Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer.

    PubMed

    Zhou, Luowei; Yang, Pei; Chen, Chunlin; Gao, Yang

    2017-05-01

    Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.

  9. Insights into the latent multinomial model through mark-resight data on female grizzly bears with cubs-of-the-year

    USGS Publications Warehouse

    Higgs, Megan D.; Link, William; White, Gary C.; Haroldson, Mark A.; Bjornlie, Daniel D.

    2013-01-01

    Mark-resight designs for estimation of population abundance are common and attractive to researchers. However, inference from such designs is very limited when faced with sparse data, either from a low number of marked animals, a low probability of detection, or both. In the Greater Yellowstone Ecosystem, yearly mark-resight data are collected for female grizzly bears with cubs-of-the-year (FCOY), and inference suffers from both limitations. To overcome difficulties due to sparseness, we assume homogeneity in sighting probabilities over 16 years of bi-annual aerial surveys. We model counts of marked and unmarked animals as multinomial random variables, using the capture frequencies of marked animals for inference about the latent multinomial frequencies for unmarked animals. We discuss undesirable behavior of the commonly used discrete uniform prior distribution on the population size parameter and provide OpenBUGS code for fitting such models. The application provides valuable insights into subtleties of implementing Bayesian inference for latent multinomial models. We tie the discussion to our application, though the insights are broadly useful for applications of the latent multinomial model.

  10. Volumetric CT with sparse detector arrays (and application to Si-strip photon counters).

    PubMed

    Sisniega, A; Zbijewski, W; Stayman, J W; Xu, J; Taguchi, K; Fredenberg, E; Lundqvist, Mats; Siewerdsen, J H

    2016-01-07

    Novel x-ray medical imaging sensors, such as photon counting detectors (PCDs) and large area CCD and CMOS cameras can involve irregular and/or sparse sampling of the detector plane. Application of such detectors to CT involves undersampling that is markedly different from the commonly considered case of sparse angular sampling. This work investigates volumetric sampling in CT systems incorporating sparsely sampled detectors with axial and helical scan orbits and evaluates performance of model-based image reconstruction (MBIR) with spatially varying regularization in mitigating artifacts due to sparse detector sampling. Volumetric metrics of sampling density and uniformity were introduced. Penalized-likelihood MBIR with a spatially varying penalty that homogenized resolution by accounting for variations in local sampling density (i.e. detector gaps) was evaluated. The proposed methodology was tested in simulations and on an imaging bench based on a Si-strip PCD (total area 5 cm  ×  25 cm) consisting of an arrangement of line sensors separated by gaps of up to 2.5 mm. The bench was equipped with translation/rotation stages allowing a variety of scanning trajectories, ranging from a simple axial acquisition to helical scans with variable pitch. Statistical (spherical clutter) and anthropomorphic (hand) phantoms were considered. Image quality was compared to that obtained with a conventional uniform penalty in terms of structural similarity index (SSIM), image uniformity, spatial resolution, contrast, and noise. Scan trajectories with intermediate helical width (~10 mm longitudinal distance per 360° rotation) demonstrated optimal tradeoff between the average sampling density and the homogeneity of sampling throughout the volume. For a scan trajectory with 10.8 mm helical width, the spatially varying penalty resulted in significant visual reduction of sampling artifacts, confirmed by a 10% reduction in minimum SSIM (from 0.88 to 0.8) and a 40% reduction in the dispersion of SSIM in the volume compared to the constant penalty (both penalties applied at optimal regularization strength). Images of the spherical clutter and wrist phantoms confirmed the advantages of the spatially varying penalty, showing a 25% improvement in image uniformity and 1.8  ×  higher CNR (at matched spatial resolution) compared to the constant penalty. The studies elucidate the relationship between sampling in the detector plane, acquisition orbit, sampling of the reconstructed volume, and the resulting image quality. They also demonstrate the benefit of spatially varying regularization in MBIR for scenarios with irregular sampling patterns. Such findings are important and integral to the incorporation of a sparsely sampled Si-strip PCD in CT imaging.

  11. Volumetric CT with sparse detector arrays (and application to Si-strip photon counters)

    NASA Astrophysics Data System (ADS)

    Sisniega, A.; Zbijewski, W.; Stayman, J. W.; Xu, J.; Taguchi, K.; Fredenberg, E.; Lundqvist, Mats; Siewerdsen, J. H.

    2016-01-01

    Novel x-ray medical imaging sensors, such as photon counting detectors (PCDs) and large area CCD and CMOS cameras can involve irregular and/or sparse sampling of the detector plane. Application of such detectors to CT involves undersampling that is markedly different from the commonly considered case of sparse angular sampling. This work investigates volumetric sampling in CT systems incorporating sparsely sampled detectors with axial and helical scan orbits and evaluates performance of model-based image reconstruction (MBIR) with spatially varying regularization in mitigating artifacts due to sparse detector sampling. Volumetric metrics of sampling density and uniformity were introduced. Penalized-likelihood MBIR with a spatially varying penalty that homogenized resolution by accounting for variations in local sampling density (i.e. detector gaps) was evaluated. The proposed methodology was tested in simulations and on an imaging bench based on a Si-strip PCD (total area 5 cm  ×  25 cm) consisting of an arrangement of line sensors separated by gaps of up to 2.5 mm. The bench was equipped with translation/rotation stages allowing a variety of scanning trajectories, ranging from a simple axial acquisition to helical scans with variable pitch. Statistical (spherical clutter) and anthropomorphic (hand) phantoms were considered. Image quality was compared to that obtained with a conventional uniform penalty in terms of structural similarity index (SSIM), image uniformity, spatial resolution, contrast, and noise. Scan trajectories with intermediate helical width (~10 mm longitudinal distance per 360° rotation) demonstrated optimal tradeoff between the average sampling density and the homogeneity of sampling throughout the volume. For a scan trajectory with 10.8 mm helical width, the spatially varying penalty resulted in significant visual reduction of sampling artifacts, confirmed by a 10% reduction in minimum SSIM (from 0.88 to 0.8) and a 40% reduction in the dispersion of SSIM in the volume compared to the constant penalty (both penalties applied at optimal regularization strength). Images of the spherical clutter and wrist phantoms confirmed the advantages of the spatially varying penalty, showing a 25% improvement in image uniformity and 1.8  ×  higher CNR (at matched spatial resolution) compared to the constant penalty. The studies elucidate the relationship between sampling in the detector plane, acquisition orbit, sampling of the reconstructed volume, and the resulting image quality. They also demonstrate the benefit of spatially varying regularization in MBIR for scenarios with irregular sampling patterns. Such findings are important and integral to the incorporation of a sparsely sampled Si-strip PCD in CT imaging.

  12. Volumetric CT with sparse detector arrays (and application to Si-strip photon counters)

    PubMed Central

    Sisniega, A; Zbijewski, W; Stayman, J W; Xu, J; Taguchi, K; Fredenberg, E; Lundqvist, Mats; Siewerdsen, J H

    2016-01-01

    Novel x-ray medical imaging sensors, such as photon counting detectors (PCDs) and large area CCD and CMOS cameras can involve irregular and/or sparse sampling of the detector plane. Application of such detectors to CT involves undersampling that is markedly different from the commonly considered case of sparse angular sampling. This work investigates volumetric sampling in CT systems incorporating sparsely sampled detectors with axial and helical scan orbits and evaluates performance of model-based image reconstruction (MBIR) with spatially varying regularization in mitigating artifacts due to sparse detector sampling. Volumetric metrics of sampling density and uniformity were introduced. Penalized-likelihood MBIR with a spatially varying penalty that homogenized resolution by accounting for variations in local sampling density (i.e. detector gaps) was evaluated. The proposed methodology was tested in simulations and on an imaging bench based on a Si-strip PCD (total area 5 cm × 25 cm) consisting of an arrangement of line sensors separated by gaps of up to 2.5 mm. The bench was equipped with translation/rotation stages allowing a variety of scanning trajectories, ranging from a simple axial acquisition to helical scans with variable pitch. Statistical (spherical clutter) and anthropomorphic (hand) phantoms were considered. Image quality was compared to that obtained with a conventional uniform penalty in terms of structural similarity index (SSIM), image uniformity, spatial resolution, contrast, and noise. Scan trajectories with intermediate helical width (~10 mm longitudinal distance per 360° rotation) demonstrated optimal tradeoff between the average sampling density and the homogeneity of sampling throughout the volume. For a scan trajectory with 10.8 mm helical width, the spatially varying penalty resulted in significant visual reduction of sampling artifacts, confirmed by a 10% reduction in minimum SSIM (from 0.88 to 0.8) and a 40% reduction in the dispersion of SSIM in the volume compared to the constant penalty (both penalties applied at optimal regularization strength). Images of the spherical clutter and wrist phantoms confirmed the advantages of the spatially varying penalty, showing a 25% improvement in image uniformity and 1.8 × higher CNR (at matched spatial resolution) compared to the constant penalty. The studies elucidate the relationship between sampling in the detector plane, acquisition orbit, sampling of the reconstructed volume, and the resulting image quality. They also demonstrate the benefit of spatially varying regularization in MBIR for scenarios with irregular sampling patterns. Such findings are important and integral to the incorporation of a sparsely sampled Si-strip PCD in CT imaging. PMID:26611740

  13. Asynchronous signal-dependent non-uniform sampler

    NASA Astrophysics Data System (ADS)

    Can-Cimino, Azime; Chaparro, Luis F.; Sejdić, Ervin

    2014-05-01

    Analog sparse signals resulting from biomedical and sensing network applications are typically non-stationary with frequency-varying spectra. By ignoring that the maximum frequency of their spectra is changing, uniform sampling of sparse signals collects unnecessary samples in quiescent segments of the signal. A more appropriate sampling approach would be signal-dependent. Moreover, in many of these applications power consumption and analog processing are issues of great importance that need to be considered. In this paper we present a signal dependent non-uniform sampler that uses a Modified Asynchronous Sigma Delta Modulator which consumes low-power and can be processed using analog procedures. Using Prolate Spheroidal Wave Functions (PSWF) interpolation of the original signal is performed, thus giving an asynchronous analog to digital and digital to analog conversion. Stable solutions are obtained by using modulated PSWFs functions. The advantage of the adapted asynchronous sampler is that range of frequencies of the sparse signal is taken into account avoiding aliasing. Moreover, it requires saving only the zero-crossing times of the non-uniform samples, or their differences, and the reconstruction can be done using their quantized values and a PSWF-based interpolation. The range of frequencies analyzed can be changed and the sampler can be implemented as a bank of filters for unknown range of frequencies. The performance of the proposed algorithm is illustrated with an electroencephalogram (EEG) signal.

  14. ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL

    PubMed Central

    Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui

    2013-01-01

    We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities. PMID:24086091

  15. ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL.

    PubMed

    Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui

    2013-06-01

    We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities.

  16. Model's sparse representation based on reduced mixed GMsFE basis methods

    NASA Astrophysics Data System (ADS)

    Jiang, Lijian; Li, Qiuqi

    2017-06-01

    In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.

  17. Model's sparse representation based on reduced mixed GMsFE basis methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn

    2017-06-01

    In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a largemore » number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.« less

  18. Task-driven dictionary learning.

    PubMed

    Mairal, Julien; Bach, Francis; Ponce, Jean

    2012-04-01

    Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.

  19. An investigation of emotion dynamics in major depressive disorder patients and healthy persons using sparse longitudinal networks.

    PubMed

    de Vos, Stijn; Wardenaar, Klaas J; Bos, Elisabeth H; Wit, Ernst C; Bouwmans, Mara E J; de Jonge, Peter

    2017-01-01

    Differences in within-person emotion dynamics may be an important source of heterogeneity in depression. To investigate these dynamics, researchers have previously combined multilevel regression analyses with network representations. However, sparse network methods, specifically developed for longitudinal network analyses, have not been applied. Therefore, this study used this approach to investigate population-level and individual-level emotion dynamics in healthy and depressed persons and compared this method with the multilevel approach. Time-series data were collected in pair-matched healthy persons and major depressive disorder (MDD) patients (n = 54). Seven positive affect (PA) and seven negative affect (NA) items were administered electronically at 90 times (30 days; thrice per day). The population-level (healthy vs. MDD) and individual-level time series were analyzed using a sparse longitudinal network model based on vector autoregression. The population-level model was also estimated with a multilevel approach. Effects of different preprocessing steps were evaluated as well. The characteristics of the longitudinal networks were investigated to gain insight into the emotion dynamics. In the population-level networks, longitudinal network connectivity was strongest in the healthy group, with nodes showing more and stronger longitudinal associations with each other. Individually estimated networks varied strongly across individuals. Individual variations in network connectivity were unrelated to baseline characteristics (depression status, neuroticism, severity). A multilevel approach applied to the same data showed higher connectivity in the MDD group, which seemed partly related to the preprocessing approach. The sparse network approach can be useful for the estimation of networks with multiple nodes, where overparameterization is an issue, and for individual-level networks. However, its current inability to model random effects makes it less useful as a population-level approach in case of large heterogeneity. Different preprocessing strategies appeared to strongly influence the results, complicating inferences about network density.

  20. Open-target sparse sensing of biological agents using DNA microarray

    PubMed Central

    2011-01-01

    Background Current biosensors are designed to target and react to specific nucleic acid sequences or structural epitopes. These 'target-specific' platforms require creation of new physical capture reagents when new organisms are targeted. An 'open-target' approach to DNA microarray biosensing is proposed and substantiated using laboratory generated data. The microarray consisted of 12,900 25 bp oligonucleotide capture probes derived from a statistical model trained on randomly selected genomic segments of pathogenic prokaryotic organisms. Open-target detection of organisms was accomplished using a reference library of hybridization patterns for three test organisms whose DNA sequences were not included in the design of the microarray probes. Results A multivariate mathematical model based on the partial least squares regression (PLSR) was developed to detect the presence of three test organisms in mixed samples. When all 12,900 probes were used, the model correctly detected the signature of three test organisms in all mixed samples (mean(R2)) = 0.76, CI = 0.95), with a 6% false positive rate. A sampling algorithm was then developed to sparsely sample the probe space for a minimal number of probes required to capture the hybridization imprints of the test organisms. The PLSR detection model was capable of correctly identifying the presence of the three test organisms in all mixed samples using only 47 probes (mean(R2)) = 0.77, CI = 0.95) with nearly 100% specificity. Conclusions We conceived an 'open-target' approach to biosensing, and hypothesized that a relatively small, non-specifically designed, DNA microarray is capable of identifying the presence of multiple organisms in mixed samples. Coupled with a mathematical model applied to laboratory generated data, and sparse sampling of capture probes, the prototype microarray platform was able to capture the signature of each organism in all mixed samples with high sensitivity and specificity. It was demonstrated that this new approach to biosensing closely follows the principles of sparse sensing. PMID:21801424

  1. Estimation for sparse vegetation information in desertification region based on Tiangong-1 hyperspectral image.

    PubMed

    Wu, Jun-Jun; Gao, Zhi-Hai; Li, Zeng-Yuan; Wang, Hong-Yan; Pang, Yong; Sun, Bin; Li, Chang-Long; Li, Xu-Zhi; Zhang, Jiu-Xing

    2014-03-01

    In order to estimate the sparse vegetation information accurately in desertification region, taking southeast of Sunite Right Banner, Inner Mongolia, as the test site and Tiangong-1 hyperspectral image as the main data, sparse vegetation coverage and biomass were retrieved based on normalized difference vegetation index (NDVI) and soil adjusted vegetation index (SAVI), combined with the field investigation data. Then the advantages and disadvantages between them were compared. Firstly, the correlation between vegetation indexes and vegetation coverage under different bands combination was analyzed, as well as the biomass. Secondly, the best bands combination was determined when the maximum correlation coefficient turned up between vegetation indexes (VI) and vegetation parameters. It showed that the maximum correlation coefficient between vegetation parameters and NDVI could reach as high as 0.7, while that of SAVI could nearly reach 0.8. The center wavelength of red band in the best bands combination for NDVI was 630nm, and that of the near infrared (NIR) band was 910 nm. Whereas, when the center wavelength was 620 and 920 nm respectively, they were the best combination for SAVI. Finally, the linear regression models were established to retrieve vegetation coverage and biomass based on Tiangong-1 VIs. R2 of all models was more than 0.5, while that of the model based on SAVI was higher than that based on NDVI, especially, the R2 of vegetation coverage retrieve model based on SAVI was as high as 0.59. By intersection validation, the standard errors RMSE based on SAVI models were lower than that of the model based on NDVI. The results showed that the abundant spectral information of Tiangong-1 hyperspectral image can reflect the actual vegetaion condition effectively, and SAVI can estimate the sparse vegetation information more accurately than NDVI in desertification region.

  2. Application of geographically-weighted regression analysis to assess risk factors for malaria hotspots in Keur Soce health and demographic surveillance site.

    PubMed

    Ndiath, Mansour M; Cisse, Badara; Ndiaye, Jean Louis; Gomis, Jules F; Bathiery, Ousmane; Dia, Anta Tal; Gaye, Oumar; Faye, Babacar

    2015-11-18

    In Senegal, considerable efforts have been made to reduce malaria morbidity and mortality during the last decade. This resulted in a marked decrease of malaria cases. With the decline of malaria cases, transmission has become sparse in most Senegalese health districts. This study investigated malaria hotspots in Keur Soce sites by using geographically-weighted regression. Because of the occurrence of hotspots, spatial modelling of malaria cases could have a considerable effect in disease surveillance. This study explored and analysed the spatial relationships between malaria occurrence and socio-economic and environmental factors in small communities in Keur Soce, Senegal, using 6 months passive surveillance. Geographically-weighted regression was used to explore the spatial variability of relationships between malaria incidence or persistence and the selected socio-economic, and human predictors. A model comparison of between ordinary least square and geographically-weighted regression was also explored. Vector dataset (spatial) of the study area by village levels and statistical data (non-spatial) on malaria confirmed cases, socio-economic status (bed net use), population data (size of the household) and environmental factors (temperature, rain fall) were used in this exploratory analysis. ArcMap 10.2 and Stata 11 were used to perform malaria hotspots analysis. From Jun to December, a total of 408 confirmed malaria cases were notified. The explanatory variables-household size, housing materials, sleeping rooms, sheep and distance to breeding site returned significant t values of -0.25, 2.3, 4.39, 1.25 and 2.36, respectively. The OLS global model revealed that it explained about 70 % (adjusted R(2) = 0.70) of the variation in malaria occurrence with AIC = 756.23. The geographically-weighted regression of malaria hotspots resulted in coefficient intercept ranging from 1.89 to 6.22 with a median of 3.5. Large positive values are distributed mainly in the southeast of the district where hotspots are more accurate while low values are mainly found in the centre and in the north. Geographically-weighted regression and OLS showed important risks factors of malaria hotspots in Keur Soce. The outputs of such models can be a useful tool to understand occurrence of malaria hotspots in Senegal. An understanding of geographical variation and determination of the core areas of the disease may provide an explanation regarding possible proximal and distal contributors to malaria elimination in Senegal.

  3. Predicting Solar Activity Using Machine-Learning Methods

    NASA Astrophysics Data System (ADS)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  4. A comparison of adaptive sampling designs and binary spatial models: A simulation study using a census of Bromus inermis

    USGS Publications Warehouse

    Irvine, Kathryn M.; Thornton, Jamie; Backus, Vickie M.; Hohmann, Matthew G.; Lehnhoff, Erik A.; Maxwell, Bruce D.; Michels, Kurt; Rew, Lisa

    2013-01-01

    Commonly in environmental and ecological studies, species distribution data are recorded as presence or absence throughout a spatial domain of interest. Field based studies typically collect observations by sampling a subset of the spatial domain. We consider the effects of six different adaptive and two non-adaptive sampling designs and choice of three binary models on both predictions to unsampled locations and parameter estimation of the regression coefficients (species–environment relationships). Our simulation study is unique compared to others to date in that we virtually sample a true known spatial distribution of a nonindigenous plant species, Bromus inermis. The census of B. inermis provides a good example of a species distribution that is both sparsely (1.9 % prevalence) and patchily distributed. We find that modeling the spatial correlation using a random effect with an intrinsic Gaussian conditionally autoregressive prior distribution was equivalent or superior to Bayesian autologistic regression in terms of predicting to un-sampled areas when strip adaptive cluster sampling was used to survey B. inermis. However, inferences about the relationships between B. inermis presence and environmental predictors differed between the two spatial binary models. The strip adaptive cluster designs we investigate provided a significant advantage in terms of Markov chain Monte Carlo chain convergence when trying to model a sparsely distributed species across a large area. In general, there was little difference in the choice of neighborhood, although the adaptive king was preferred when transects were randomly placed throughout the spatial domain.

  5. Development of a predictive model for lead, cadmium and fluorine soil-water partition coefficients using sparse multiple linear regression analysis.

    PubMed

    Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi

    2017-11-01

    In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease

    PubMed Central

    Liu, Hongcheng; Du, Guangwei; Zhang, Lijun; Lewis, Mechelle M.; Wang, Xue; Yao, Tao; Li, Runze; Huang, Xuemei

    2016-01-01

    Background Brain MRI holds promise to gauge different aspects of Parkinson’s disease (PD)-related pathological changes. Its analysis, however, is hindered by the high-dimensional nature of the data. New method This study introduces folded concave penalized (FCP) sparse logistic regression to identify biomarkers for PD from a large number of potential factors. The proposed statistical procedures target the challenges of high-dimensionality with limited data samples acquired. The maximization problem associated with the sparse logistic regression model is solved by local linear approximation. The proposed procedures then are applied to the empirical analysis of multimodal MRI data. Results From 45 features, the proposed approach identified 15 MRI markers and the UPSIT, which are known to be clinically relevant to PD. By combining the MRI and clinical markers, we can enhance substantially the specificity and sensitivity of the model, as indicated by the ROC curves. Comparison to existing methods We compare the folded concave penalized learning scheme with both the Lasso penalized scheme and the principle component analysis-based feature selection (PCA) in the Parkinson’s biomarker identification problem that takes into account both the clinical features and MRI markers. The folded concave penalty method demonstrates a substantially better clinical potential than both the Lasso and PCA in terms of specificity and sensitivity. Conclusions For the first time, we applied the FCP learning method to MRI biomarker discovery in PD. The proposed approach successfully identified MRI markers that are clinically relevant. Combining these biomarkers with clinical features can substantially enhance performance. PMID:27102045

  7. Complexity of Kronecker Operations on Sparse Matrices with Applications to the Solution of Markov Models

    NASA Technical Reports Server (NTRS)

    Buchholz, Peter; Ciardo, Gianfranco; Donatelli, Susanna; Kemper, Peter

    1997-01-01

    We present a systematic discussion of algorithms to multiply a vector by a matrix expressed as the Kronecker product of sparse matrices, extending previous work in a unified notational framework. Then, we use our results to define new algorithms for the solution of large structured Markov models. In addition to a comprehensive overview of existing approaches, we give new results with respect to: (1) managing certain types of state-dependent behavior without incurring extra cost; (2) supporting both Jacobi-style and Gauss-Seidel-style methods by appropriate multiplication algorithms; (3) speeding up algorithms that consider probability vectors of size equal to the "actual" state space instead of the "potential" state space.

  8. Sparse Zero-Sum Games as Stable Functional Feature Selection

    PubMed Central

    Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

    2015-01-01

    In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

  9. Real-Space Analysis of Scanning Tunneling Microscopy Topography Datasets Using Sparse Modeling Approach

    NASA Astrophysics Data System (ADS)

    Miyama, Masamichi J.; Hukushima, Koji

    2018-04-01

    A sparse modeling approach is proposed for analyzing scanning tunneling microscopy topography data, which contain numerous peaks originating from the electron density of surface atoms and/or impurities. The method, based on the relevance vector machine with L1 regularization and k-means clustering, enables separation of the peaks and peak center positioning with accuracy beyond the resolution of the measurement grid. The validity and efficiency of the proposed method are demonstrated using synthetic data in comparison with the conventional least-squares method. An application of the proposed method to experimental data of a metallic oxide thin-film clearly indicates the existence of defects and corresponding local lattice distortions.

  10. Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205

    NASA Technical Reports Server (NTRS)

    Bauschlicher, Charles W., Jr.; Partridge, Harry

    1987-01-01

    Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.

  11. Novel Spectral Representations and Sparsity-Driven Algorithms for Shape Modeling and Analysis

    NASA Astrophysics Data System (ADS)

    Zhong, Ming

    In this dissertation, we focus on extending classical spectral shape analysis by incorporating spectral graph wavelets and sparsity-seeking algorithms. Defined with the graph Laplacian eigenbasis, the spectral graph wavelets are localized both in the vertex domain and graph spectral domain, and thus are very effective in describing local geometry. With a rich dictionary of elementary vectors and forcing certain sparsity constraints, a real life signal can often be well approximated by a very sparse coefficient representation. The many successful applications of sparse signal representation in computer vision and image processing inspire us to explore the idea of employing sparse modeling techniques with dictionary of spectral basis to solve various shape modeling problems. Conventional spectral mesh compression uses the eigenfunctions of mesh Laplacian as shape bases, which are highly inefficient in representing local geometry. To ameliorate, we advocate an innovative approach to 3D mesh compression using spectral graph wavelets as dictionary to encode mesh geometry. The spectral graph wavelets are locally defined at individual vertices and can better capture local shape information than Laplacian eigenbasis. The multi-scale SGWs form a redundant dictionary as shape basis, so we formulate the compression of 3D shape as a sparse approximation problem that can be readily handled by greedy pursuit algorithms. Surface inpainting refers to the completion or recovery of missing shape geometry based on the shape information that is currently available. We devise a new surface inpainting algorithm founded upon the theory and techniques of sparse signal recovery. Instead of estimating the missing geometry directly, our novel method is to find this low-dimensional representation which describes the entire original shape. More specifically, we find that, for many shapes, the vertex coordinate function can be well approximated by a very sparse coefficient representation with respect to the dictionary comprising its Laplacian eigenbasis, and it is then possible to recover this sparse representation from partial measurements of the original shape. Taking advantage of the sparsity cue, we advocate a novel variational approach for surface inpainting, integrating data fidelity constraints on the shape domain with coefficient sparsity constraints on the transformed domain. Because of the powerful properties of Laplacian eigenbasis, the inpainting results of our method tend to be globally coherent with the remaining shape. Informative and discriminative feature descriptors are vital in qualitative and quantitative shape analysis for a large variety of graphics applications. We advocate novel strategies to define generalized, user-specified features on shapes. Our new region descriptors are primarily built upon the coefficients of spectral graph wavelets that are both multi-scale and multi-level in nature, consisting of both local and global information. Based on our novel spectral feature descriptor, we developed a user-specified feature detection framework and a tensor-based shape matching algorithm. Through various experiments, we demonstrate the competitive performance of our proposed methods and the great potential of spectral basis and sparsity-driven methods for shape modeling.

  12. Automatic Management of Parallel and Distributed System Resources

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  13. Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations

    PubMed Central

    Chaspari, Theodora; Tsiartas, Andreas; Tsilifis, Panagiotis; Narayanan, Shrikanth

    2016-01-01

    Parametric dictionaries can increase the ability of sparse representations to meaningfully capture and interpret the underlying signal information, such as encountered in biomedical problems. Given a mapping function from the atom parameter space to the actual atoms, we propose a sparse Bayesian framework for learning the atom parameters, because of its ability to provide full posterior estimates, take uncertainty into account and generalize on unseen data. Inference is performed with Markov Chain Monte Carlo, that uses block sampling to generate the variables of the Bayesian problem. Since the parameterization of dictionary atoms results in posteriors that cannot be analytically computed, we use a Metropolis-Hastings-within-Gibbs framework, according to which variables with closed-form posteriors are generated with the Gibbs sampler, while the remaining ones with the Metropolis Hastings from appropriate candidate-generating densities. We further show that the corresponding Markov Chain is uniformly ergodic ensuring its convergence to a stationary distribution independently of the initial state. Results on synthetic data and real biomedical signals indicate that our approach offers advantages in terms of signal reconstruction compared to previously proposed Steepest Descent and Equiangular Tight Frame methods. This paper demonstrates the ability of Bayesian learning to generate parametric dictionaries that can reliably represent the exemplar data and provides the foundation towards inferring the entire variable set of the sparse approximation problem for signal denoising, adaptation and other applications. PMID:28649173

  14. Dynamic Textures Modeling via Joint Video Dictionary Learning.

    PubMed

    Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng

    2017-04-06

    Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.

  15. Multiple Kernel Sparse Representation based Orthogonal Discriminative Projection and Its Cost-Sensitive Extension.

    PubMed

    Zhang, Guoqing; Sun, Huaijiang; Xia, Guiyu; Sun, Quansen

    2016-07-07

    Sparse representation based classification (SRC) has been developed and shown great potential for real-world application. Based on SRC, Yang et al. [10] devised a SRC steered discriminative projection (SRC-DP) method. However, as a linear algorithm, SRC-DP cannot handle the data with highly nonlinear distribution. Kernel sparse representation-based classifier (KSRC) is a non-linear extension of SRC and can remedy the drawback of SRC. KSRC requires the use of a predetermined kernel function and selection of the kernel function and its parameters is difficult. Recently, multiple kernel learning for SRC (MKL-SRC) [22] has been proposed to learn a kernel from a set of base kernels. However, MKL-SRC only considers the within-class reconstruction residual while ignoring the between-class relationship, when learning the kernel weights. In this paper, we propose a novel multiple kernel sparse representation-based classifier (MKSRC), and then we use it as a criterion to design a multiple kernel sparse representation based orthogonal discriminative projection method (MK-SR-ODP). The proposed algorithm aims at learning a projection matrix and a corresponding kernel from the given base kernels such that in the low dimension subspace the between-class reconstruction residual is maximized and the within-class reconstruction residual is minimized. Furthermore, to achieve a minimum overall loss by performing recognition in the learned low-dimensional subspace, we introduce cost information into the dimensionality reduction method. The solutions for the proposed method can be efficiently found based on trace ratio optimization method [33]. Extensive experimental results demonstrate the superiority of the proposed algorithm when compared with the state-of-the-art methods.

  16. Exploratory graphical models of functional and structural connectivity patterns for Alzheimer's Disease diagnosis.

    PubMed

    Ortiz, Andrés; Munilla, Jorge; Álvarez-Illán, Ignacio; Górriz, Juan M; Ramírez, Javier

    2015-01-01

    Alzheimer's Disease (AD) is the most common neurodegenerative disease in elderly people. Its development has been shown to be closely related to changes in the brain connectivity network and in the brain activation patterns along with structural changes caused by the neurodegenerative process. Methods to infer dependence between brain regions are usually derived from the analysis of covariance between activation levels in the different areas. However, these covariance-based methods are not able to estimate conditional independence between variables to factor out the influence of other regions. Conversely, models based on the inverse covariance, or precision matrix, such as Sparse Gaussian Graphical Models allow revealing conditional independence between regions by estimating the covariance between two variables given the rest as constant. This paper uses Sparse Inverse Covariance Estimation (SICE) methods to learn undirected graphs in order to derive functional and structural connectivity patterns from Fludeoxyglucose (18F-FDG) Position Emission Tomography (PET) data and segmented Magnetic Resonance images (MRI), drawn from the ADNI database, for Control, MCI (Mild Cognitive Impairment Subjects), and AD subjects. Sparse computation fits perfectly here as brain regions usually only interact with a few other areas. The models clearly show different metabolic covariation patters between subject groups, revealing the loss of strong connections in AD and MCI subjects when compared to Controls. Similarly, the variance between GM (Gray Matter) densities of different regions reveals different structural covariation patterns between the different groups. Thus, the different connectivity patterns for controls and AD are used in this paper to select regions of interest in PET and GM images with discriminative power for early AD diagnosis. Finally, functional an structural models are combined to leverage the classification accuracy. The results obtained in this work show the usefulness of the Sparse Gaussian Graphical models to reveal functional and structural connectivity patterns. This information provided by the sparse inverse covariance matrices is not only used in an exploratory way but we also propose a method to use it in a discriminative way. Regression coefficients are used to compute reconstruction errors for the different classes that are then introduced in a SVM for classification. Classification experiments performed using 68 Controls, 70 AD, and 111 MCI images and assessed by cross-validation show the effectiveness of the proposed method.

  17. Relaxations to Sparse Optimization Problems and Applications

    NASA Astrophysics Data System (ADS)

    Skau, Erik West

    Parsimony is a fundamental property that is applied to many characteristics in a variety of fields. Of particular interest are optimization problems that apply rank, dimensionality, or support in a parsimonious manner. In this thesis we study some optimization problems and their relaxations, and focus on properties and qualities of the solutions of these problems. The Gramian tensor decomposition problem attempts to decompose a symmetric tensor as a sum of rank one tensors.We approach the Gramian tensor decomposition problem with a relaxation to a semidefinite program. We study conditions which ensure that the solution of the relaxed semidefinite problem gives the minimal Gramian rank decomposition. Sparse representations with learned dictionaries are one of the leading image modeling techniques for image restoration. When learning these dictionaries from a set of training images, the sparsity parameter of the dictionary learning algorithm strongly influences the content of the dictionary atoms.We describe geometrically the content of trained dictionaries and how it changes with the sparsity parameter.We use statistical analysis to characterize how the different content is used in sparse representations. Finally, a method to control the structure of the dictionaries is demonstrated, allowing us to learn a dictionary which can later be tailored for specific applications. Variations of dictionary learning can be broadly applied to a variety of applications.We explore a pansharpening problem with a triple factorization variant of coupled dictionary learning. Another application of dictionary learning is computer vision. Computer vision relies heavily on object detection, which we explore with a hierarchical convolutional dictionary learning model. Data fusion of disparate modalities is a growing topic of interest.We do a case study to demonstrate the benefit of using social media data with satellite imagery to estimate hazard extents. In this case study analysis we apply a maximum entropy model, guided by the social media data, to estimate the flooded regions during a 2013 flood in Boulder, CO and show that the results are comparable to those obtained using expert information.

  18. Application of Toxicogenomics in Decision Making in Ecological and Human Health Risk Assessment

    EPA Science Inventory

    Uncertainties in risk assessment arise from sparse or inadequate data including gaps in our understanding of mode of action, the exposure-dose-response pathway, cross-species toxicokinetic and toxicodynamic information, and/or exposure data. There is an expectation that toxicogen...

  19. Holographic implementation of a binary associative memory for improved recognition

    NASA Astrophysics Data System (ADS)

    Bandyopadhyay, Somnath; Ghosh, Ajay; Datta, Asit K.

    1998-03-01

    Neural network associate memory has found wide application sin pattern recognition techniques. We propose an associative memory model for binary character recognition. The interconnection strengths of the memory are binary valued. The concept of sparse coding is sued to enhance the storage efficiency of the model. The question of imposed preconditioning of pattern vectors, which is inherent in a sparsely coded conventional memory, is eliminated by using a multistep correlation technique an the ability of correct association is enhanced in a real-time application. A potential optoelectronic implementation of the proposed associative memory is also described. The learning and recall is possible by using digital optical matrix-vector multiplication, where full use of parallelism and connectivity of optics is made. A hologram is used in the experiment as a longer memory (LTM) for storing all input information. The short-term memory or the interconnection weight matrix required during the recall process is configured by retrieving the necessary information from the holographic LTM.

  20. A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data.

    PubMed

    Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

    2013-06-01

    Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.

  1. A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data

    PubMed Central

    Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

    2014-01-01

    Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720

  2. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    PubMed Central

    2010-01-01

    Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data. PMID:21062443

  3. Data mining for materials design: A computational study of single molecule magnet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dam, Hieu Chi; Faculty of Physics, Vietnam National University, 334 Nguyen Trai, Hanoi; Pham, Tien Lam

    2014-01-28

    We develop a method that combines data mining and first principles calculation to guide the designing of distorted cubane Mn{sup 4+} Mn {sub 3}{sup 3+} single molecule magnets. The essential idea of the method is a process consisting of sparse regressions and cross-validation for analyzing calculated data of the materials. The method allows us to demonstrate that the exchange coupling between Mn{sup 4+} and Mn{sup 3+} ions can be predicted from the electronegativities of constituent ligands and the structural features of the molecule by a linear regression model with high accuracy. The relations between the structural features and magnetic propertiesmore » of the materials are quantitatively and consistently evaluated and presented by a graph. We also discuss the properties of the materials and guide the material design basing on the obtained results.« less

  4. A compressed sensing X-ray camera with a multilayer architecture

    NASA Astrophysics Data System (ADS)

    Wang, Zhehui; Iaroshenko, O.; Li, S.; Liu, T.; Parab, N.; Chen, W. W.; Chu, P.; Kenyon, G. T.; Lipton, R.; Sun, K.-X.

    2018-01-01

    Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. Here we first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.

  5. Shearlet-based regularization in sparse dynamic tomography

    NASA Astrophysics Data System (ADS)

    Bubba, T. A.; März, M.; Purisha, Z.; Lassas, M.; Siltanen, S.

    2017-08-01

    Classical tomographic imaging is soundly understood and widely employed in medicine, nondestructive testing and security applications. However, it still offers many challenges when it comes to dynamic tomography. Indeed, in classical tomography, the target is usually assumed to be stationary during the data acquisition, but this is not a realistic model. Moreover, to ensure a lower X-ray radiation dose, only a sparse collection of measurements per time step is assumed to be available. With such a set up, we deal with a sparse data, dynamic tomography problem, which clearly calls for regularization, due to the loss of information in the data and the ongoing motion. In this paper, we propose a 3D variational formulation based on 3D shearlets, where the third dimension accounts for the motion in time, to reconstruct a moving 2D object. Results are presented for real measured data and compared against a 2D static model, in the case of fan-beam geometry. Results are preliminary but show that better reconstructions can be achieved when motion is taken into account.

  6. Efficient Computation of Anharmonic Force Constants via q-space, with Application to Graphene

    NASA Astrophysics Data System (ADS)

    Kornbluth, Mordechai; Marianetti, Chris

    We present a new approach for extracting anharmonic force constants from a sparse sampling of the anharmonic dynamical tensor. We calculate the derivative of the energy with respect to q-space displacements (phonons) and strain, which guarantees the absence of supercell image errors. Central finite differences provide a well-converged quadratic error tail for each derivative, separating the contribution of each anharmonic order. These derivatives populate the anharmonic dynamical tensor in a sparse mesh that bounds the Brillouin Zone, which ensures comprehensive sampling of q-space while exploiting small-cell calculations for efficient, high-throughput computation. This produces a well-converged and precisely-defined dataset, suitable for big-data approaches. We transform this sparsely-sampled anharmonic dynamical tensor to real-space anharmonic force constants that obey full space-group symmetries by construction. Machine-learning techniques identify the range of real-space interactions. We show the entire process executed for graphene, up to and including the fifth-order anharmonic force constants. This method successfully calculates strain-based phonon renormalization in graphene, even under large strains, which solves a major shortcoming of previous potentials.

  7. Compressive sensing for single-shot two-dimensional coherent spectroscopy

    NASA Astrophysics Data System (ADS)

    Harel, E.; Spencer, A.; Spokoyny, B.

    2017-02-01

    In this work, we explore the use of compressive sensing for the rapid acquisition of two-dimensional optical spectra that encodes the electronic structure and ultrafast dynamics of condensed-phase molecular species. Specifically, we have developed a means to combine multiplexed single-element detection and single-shot and phase-resolved two-dimensional coherent spectroscopy. The method described, which we call Single Point Array Reconstruction by Spatial Encoding (SPARSE) eliminates the need for costly array detectors while speeding up acquisition by several orders of magnitude compared to scanning methods. Physical implementation of SPARSE is facilitated by combining spatiotemporal encoding of the nonlinear optical response and signal modulation by a high-speed digital micromirror device. We demonstrate the approach by investigating a well-characterized cyanine molecule and a photosynthetic pigment-protein complex. Hadamard and compressive sensing algorithms are demonstrated, with the latter achieving compression factors as high as ten. Both show good agreement with directly detected spectra. We envision a myriad of applications in nonlinear spectroscopy using SPARSE with broadband femtosecond light sources in so-far unexplored regions of the electromagnetic spectrum.

  8. Sparse imaging for fast electron microscopy

    NASA Astrophysics Data System (ADS)

    Anderson, Hyrum S.; Ilic-Helms, Jovana; Rohrer, Brandon; Wheeler, Jason; Larson, Kurt

    2013-02-01

    Scanning electron microscopes (SEMs) are used in neuroscience and materials science to image centimeters of sample area at nanometer scales. Since imaging rates are in large part SNR-limited, large collections can lead to weeks of around-the-clock imaging time. To increase data collection speed, we propose and demonstrate on an operational SEM a fast method to sparsely sample and reconstruct smooth images. To accurately localize the electron probe position at fast scan rates, we model the dynamics of the scan coils, and use the model to rapidly and accurately visit a randomly selected subset of pixel locations. Images are reconstructed from the undersampled data by compressed sensing inversion using image smoothness as a prior. We report image fidelity as a function of acquisition speed by comparing traditional raster to sparse imaging modes. Our approach is equally applicable to other domains of nanometer microscopy in which the time to position a probe is a limiting factor (e.g., atomic force microscopy), or in which excessive electron doses might otherwise alter the sample being observed (e.g., scanning transmission electron microscopy).

  9. Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita; Jordan, Harry F.

    1989-01-01

    It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.

  10. Chemical exchange in biomacromolecules: Past, present, and future

    PubMed Central

    Palmer, Arthur G.

    2014-01-01

    The perspective reviews quantitative investigations of chemical exchange phenomena in proteins and other biological macromolecules using NMR spectroscopy, particularly relaxation dispersion methods. The emphasis is on techniques and applications that quantify the populations, interconversion kinetics, and structural features of sparsely populated conformational states in equilibrium with a highly populated ground state. Applications to folding, mol ecular recognition, catalysis, and allostery by proteins and nucleic acids are highlighted. PMID:24656076

  11. Explorative spatial analysis of traffic accident statistics and road mortality among the provinces of Turkey.

    PubMed

    Erdogan, Saffet

    2009-10-01

    The aim of the study is to describe the inter-province differences in traffic accidents and mortality on roads of Turkey. Two different risk indicators were used to evaluate the road safety performance of the provinces in Turkey. These indicators are the ratios between the number of persons killed in road traffic accidents (1) and the number of accidents (2) (nominators) and their exposure to traffic risk (denominator). Population and the number of registered motor vehicles in the provinces were used as denominators individually. Spatial analyses were performed to the mean annual rate of deaths and to the number of fatal accidents that were calculated for the period of 2001-2006. Empirical Bayes smoothing was used to remove background noise from the raw death and accident rates because of the sparsely populated provinces and small number of accident and death rates of provinces. Global and local spatial autocorrelation analyses were performed to show whether the provinces with high rates of deaths-accidents show clustering or are located closer by chance. The spatial distribution of provinces with high rates of deaths and accidents was nonrandom and detected as clustered with significance of P<0.05 with spatial autocorrelation analyses. Regions with high concentration of fatal accidents and deaths were located in the provinces that contain the roads connecting the Istanbul, Ankara, and Antalya provinces. Accident and death rates were also modeled with some independent variables such as number of motor vehicles, length of roads, and so forth using geographically weighted regression analysis with forward step-wise elimination. The level of statistical significance was taken as P<0.05. Large differences were found between the rates of deaths and accidents according to denominators in the provinces. The geographically weighted regression analyses did significantly better predictions for both accident rates and death rates than did ordinary least regressions, as indicated by adjusted R(2) values. Geographically weighted regression provided values of 0.89-0.99 adjusted R(2) for death and accident rates, compared with 0.88-0.95, respectively, by ordinary least regressions. Geographically weighted regression has the potential to reveal local patterns in the spatial distribution of rates, which would be ignored by the ordinary least regression approach. The application of spatial analysis and modeling of accident statistics and death rates at provincial level in Turkey will help to identification of provinces with outstandingly high accident and death rates. This could help more efficient road safety management in Turkey.

  12. Environmental modeling in data-sparse regions: Mozambique demonstrator case

    NASA Astrophysics Data System (ADS)

    Schumann, G.; Niebuhr, E.; Rashid, K.; Escobar, V. M.; Andreadis, K.; Njoku, E. G.; Neal, J. C.; Voisin, N.; Pappenberger, F.; Phanthuwongpakdee, N.; Bates, P. D.; Chao, Y.; Moller, D.; Paron, P.

    2014-12-01

    Long time-series computations of seasonal and flood event inundation volumes from archived forecast rainfall events for the Lower Zambezi basin (Mozambique), using a coupled hydrology-hydrodynamic model, are correlated and regressed with satellite soil moisture observations and NWP rainfall forecasts as predictors for inundation volumes. This dynamic library of volume predictions can then be re-projected onto the topography to generate the corresponding floodplain and wetland inundation dynamics, including periods of flood and low flows. Especially for data-poor regions, the application potential of such a library of data is invaluable as the modeling chain is greatly simplified and readily available. The library is flexible, portable and transitional. Furthermore, deriving environmental indicators from this dynamic look-up catalogue would be relatively straightforward. Application fields are various and here we present conceptually a few that we plan to research in more detail and on some of which we already collaborate with other scientists and international institutions, though at the moment largely on an unfunded basis. The primary application is to implement an early warning system for flood inundation relief operations and flood inundation mitigation and resilience. Having this flood inundation warning system set up adequately would also allow looking into long-term predictions of crop productivity and consequently food security. Another potentially high-impact application is to relate flood inundation dynamics to disease modeling for public health monitoring and prediction, in particular focusing on Malaria. Last but not least, the dynamic inundation library we are building can be validated and complemented with advanced airborne radar imagery of flooding and inundated wetlands to study changes in wetland ecology and biodiversity with unprecedented detail in data-poor regions, in this case in particular the important wetlands of the Zambezi Delta.

  13. M-estimation for robust sparse unmixing of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Toomik, Maria; Lu, Shijian; Nelson, James D. B.

    2016-10-01

    Hyperspectral unmixing methods often use a conventional least squares based lasso which assumes that the data follows the Gaussian distribution. The normality assumption is an approximation which is generally invalid for real imagery data. We consider a robust (non-Gaussian) approach to sparse spectral unmixing of remotely sensed imagery which reduces the sensitivity of the estimator to outliers and relaxes the linearity assumption. The method consists of several appropriate penalties. We propose to use an lp norm with 0 < p < 1 in the sparse regression problem, which induces more sparsity in the results, but makes the problem non-convex. On the other hand, the problem, though non-convex, can be solved quite straightforwardly with an extensible algorithm based on iteratively reweighted least squares. To deal with the huge size of modern spectral libraries we introduce a library reduction step, similar to the multiple signal classification (MUSIC) array processing algorithm, which not only speeds up unmixing but also yields superior results. In the hyperspectral setting we extend the traditional least squares method to the robust heavy-tailed case and propose a generalised M-lasso solution. M-estimation replaces the Gaussian likelihood with a fixed function ρ(e) that restrains outliers. The M-estimate function reduces the effect of errors with large amplitudes or even assigns the outliers zero weights. Our experimental results on real hyperspectral data show that noise with large amplitudes (outliers) often exists in the data. This ability to mitigate the influence of such outliers can therefore offer greater robustness. Qualitative hyperspectral unmixing results on real hyperspectral image data corroborate the efficacy of the proposed method.

  14. Rapid and accurate peripheral nerve detection using multipoint Raman imaging (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Kumamoto, Yasuaki; Minamikawa, Takeo; Kawamura, Akinori; Matsumura, Junichi; Tsuda, Yuichiro; Ukon, Juichiro; Harada, Yoshinori; Tanaka, Hideo; Takamatsu, Tetsuro

    2017-02-01

    Nerve-sparing surgery is essential to avoid functional deficits of the limbs and organs. Raman scattering, a label-free, minimally invasive, and accurate modality, is one of the best candidate technologies to detect nerves for nerve-sparing surgery. However, Raman scattering imaging is too time-consuming to be employed in surgery. Here we present a rapid and accurate nerve visualization method using a multipoint Raman imaging technique that has enabled simultaneous spectra measurement from different locations (n=32) of a sample. Five sec is sufficient for measuring n=32 spectra with good S/N from a given tissue. Principal component regression discriminant analysis discriminated spectra obtained from peripheral nerves (n=863 from n=161 myelinated nerves) and connective tissue (n=828 from n=121 tendons) with sensitivity and specificity of 88.3% and 94.8%, respectively. To compensate the spatial information of a multipoint-Raman-derived tissue discrimination image that is too sparse to visualize nerve arrangement, we used morphological information obtained from a bright-field image. When merged with the sparse tissue discrimination image, a morphological image of a sample shows what portion of Raman measurement points in arbitrary structure is determined as nerve. Setting a nerve detection criterion on the portion of "nerve" points in the structure as 40% or more, myelinated nerves (n=161) and tendons (n=121) were discriminated with sensitivity and specificity of 97.5%. The presented technique utilizing a sparse multipoint Raman image and a bright-field image has enabled rapid, safe, and accurate detection of peripheral nerves.

  15. Bézier B¯ projection

    NASA Astrophysics Data System (ADS)

    Miao, Di; Borden, Michael J.; Scott, Michael A.; Thomas, Derek C.

    2018-06-01

    In this paper we demonstrate the use of B\\'{e}zier projection to alleviate locking phenomena in structural mechanics applications of isogeometric analysis. Interpreting the well-known $\\bar{B}$ projection in two different ways we develop two formulations for locking problems in beams and nearly incompressible elastic solids. One formulation leads to a sparse symmetric symmetric system and the other leads to a sparse non-symmetric system. To demonstrate the utility of B\\'{e}zier projection for both geometry and material locking phenomena we focus on transverse shear locking in Timoshenko beams and volumetric locking in nearly compressible linear elasticity although the approach can be applied generally to other types of locking phenemona as well. B\\'{e}zier projection is a local projection technique with optimal approximation properties, which in many cases produces solutions that are comparable to global $L^2$ projection. In the context of $\\bar{B}$ methods, the use of B\\'ezier projection produces sparse stiffness matrices with only a slight increase in bandwidth when compared to standard displacement-based methods. Of particular importance is that the approach is applicable to any spline representation that can be written in B\\'ezier form like NURBS, T-splines, LR-splines, etc. We discuss in detail how to integrate this approach into an existing finite element framework with minimal disruption through the use of B\\'ezier extraction operators and a newly introduced dual basis for the B\\'{e}zierprojection operator. We then demonstrate the behavior of the two proposed formulations through several challenging benchmark problems.

  16. Acceleration of GPU-based Krylov solvers via data transfer reduction

    DOE PAGES

    Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...

    2015-04-08

    Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less

  17. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES1

    PubMed Central

    Zhu, Xiang; Stephens, Matthew

    2017-01-01

    Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241

  18. Gene regulatory network inference using fused LASSO on multiple data sets

    PubMed Central

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-01-01

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687

  19. Classification of Astrocytomas and Oligodendrogliomas from Mass Spectrometry Data Using Sparse Kernel Machines

    PubMed Central

    Huang, Jacob; Gholami, Behnood; Agar, Nathalie Y. R.; Norton, Isaiah; Haddad, Wassim M.; Tannenbaum, Allen R.

    2013-01-01

    Glioma histologies are the primary factor in prognostic estimates and are used in determining the proper course of treatment. Furthermore, due to the sensitivity of cranial environments, real-time tumor-cell classification and boundary detection can aid in the precision and completeness of tumor resection. A recent improvement to mass spectrometry known as desorption electrospray ionization operates in an ambient environment without the application of a preparation compound. This allows for a real-time acquisition of mass spectra during surgeries and other live operations. In this paper, we present a framework using sparse kernel machines to determine a glioma sample’s histopathological subtype by analyzing its chemical composition acquired by desorption electrospray ionization mass spectrometry. PMID:22256188

  20. National-scale aboveground biomass geostatistical mapping with FIA inventory and GLAS data: Preparation for sparsely sampled lidar assisted forest inventory

    NASA Astrophysics Data System (ADS)

    Babcock, C. R.; Finley, A. O.; Andersen, H. E.; Moskal, L. M.; Morton, D. C.; Cook, B.; Nelson, R.

    2017-12-01

    Upcoming satellite lidar missions, such as GEDI and IceSat-2, are designed to collect laser altimetry data from space for narrow bands along orbital tracts. As a result lidar metric sets derived from these sources will not be of complete spatial coverage. This lack of complete coverage, or sparsity, means traditional regression approaches that consider lidar metrics as explanatory variables (without error) cannot be used to generate wall-to-wall maps of forest inventory variables. We implement a coregionalization framework to jointly model sparsely sampled lidar information and point-referenced forest variable measurements to create wall-to-wall maps with full probabilistic uncertainty quantification of all inputs. We inform the model with USFS Forest Inventory and Analysis (FIA) in-situ forest measurements and GLAS lidar data to spatially predict aboveground forest biomass (AGB) across the contiguous US. We cast our model within a Bayesian hierarchical framework to better model complex space-varying correlation structures among the lidar metrics and FIA data, which yields improved prediction and uncertainty assessment. To circumvent computational difficulties that arise when fitting complex geostatistical models to massive datasets, we use a Nearest Neighbor Gaussian process (NNGP) prior. Results indicate that a coregionalization modeling approach to leveraging sampled lidar data to improve AGB estimation is effective. Further, fitting the coregionalization model within a Bayesian mode of inference allows for AGB quantification across scales ranging from individual pixel estimates of AGB density to total AGB for the continental US with uncertainty. The coregionalization framework examined here is directly applicable to future spaceborne lidar acquisitions from GEDI and IceSat-2. Pairing these lidar sources with the extensive FIA forest monitoring plot network using a joint prediction framework, such as the coregionalization model explored here, offers the potential to improve forest AGB accounting certainty and provide maps for post-model fitting analysis of the spatial distribution of AGB.

  1. How many stakes are required to measure the mass balance of a glacier?

    USGS Publications Warehouse

    Fountain, A.G.; Vecchia, A.

    1999-01-01

    Glacier mass balance is estimated for South Cascade Glacier and Maclure Glacier using a one-dimensional regression of mass balance with altitude as an alternative to the traditional approach of contouring mass balance values. One attractive feature of regression is that it can be applied to sparse data sets where contouring is not possible and can provide an objective error of the resulting estimate. Regression methods yielded mass balance values equivalent to contouring methods. The effect of the number of mass balance measurements on the final value for the glacier showed that sample sizes as small as five stakes provided reasonable estimates, although the error estimates were greater than for larger sample sizes. Different spatial patterns of measurement locations showed no appreciable influence on the final value as long as different surface altitudes were intermittently sampled over the altitude range of the glacier. Two different regression equations were examined, a quadratic, and a piecewise linear spline, and comparison of results showed little sensitivity to the type of equation. These results point to the dominant effect of the gradient of mass balance with altitude of alpine glaciers compared to transverse variations. The number of mass balance measurements required to determine the glacier balance appears to be scale invariant for small glaciers and five to ten stakes are sufficient.

  2. The effect of herbicide application on rangeland soil nutrient availability

    USDA-ARS?s Scientific Manuscript database

    Very sparse literature exists on the effect of soil active herbicides on nutrient availability. As part of a larger rangeland rehabilitation project, on four sites in northern Nevada, we quantified the effect of the herbicides Landmark®, Perspective®, and Plateau® relative to controls on surface soi...

  3. Fast Solution in Sparse LDA for Binary Classification

    NASA Technical Reports Server (NTRS)

    Moghaddam, Baback

    2010-01-01

    An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.

  4. Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries Using Sparse Linear Regression

    PubMed Central

    Fisher, Charles K.; Mehta, Pankaj

    2014-01-01

    Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome. PMID:25054627

  5. Connectivity Strength-Weighted Sparse Group Representation-Based Brain Network Construction for MCI Classification

    PubMed Central

    Yu, Renping; Zhang, Han; An, Le; Chen, Xiaobo; Wei, Zhihui; Shen, Dinggang

    2017-01-01

    Brain functional network analysis has shown great potential in understanding brain functions and also in identifying biomarkers for brain diseases, such as Alzheimer's disease (AD) and its early stage, mild cognitive impairment (MCI). In these applications, accurate construction of biologically meaningful brain network is critical. Sparse learning has been widely used for brain network construction; however, its l1-norm penalty simply penalizes each edge of a brain network equally, without considering the original connectivity strength which is one of the most important inherent linkwise characters. Besides, based on the similarity of the linkwise connectivity, brain network shows prominent group structure (i.e., a set of edges sharing similar attributes). In this article, we propose a novel brain functional network modeling framework with a “connectivity strength-weighted sparse group constraint.” In particular, the network modeling can be optimized by considering both raw connectivity strength and its group structure, without losing the merit of sparsity. Our proposed method is applied to MCI classification, a challenging task for early AD diagnosis. Experimental results based on the resting-state functional MRI, from 50 MCI patients and 49 healthy controls, show that our proposed method is more effective (i.e., achieving a significantly higher classification accuracy, 84.8%) than other competing methods (e.g., sparse representation, accuracy = 65.6%). Post hoc inspection of the informative features further shows more biologically meaningful brain functional connectivities obtained by our proposed method. PMID:28150897

  6. Low-rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging

    PubMed Central

    Ravishankar, Saiprasad; Moore, Brian E.; Nadakuditi, Raj Rao; Fessler, Jeffrey A.

    2017-01-01

    Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamic magnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method. PMID:28092528

  7. Low-Rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging.

    PubMed

    Ravishankar, Saiprasad; Moore, Brian E; Nadakuditi, Raj Rao; Fessler, Jeffrey A

    2017-05-01

    Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery fromundersampledmeasurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamicmagnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method.

  8. Developing a regional retrospective ensemble precipitation dataset for watershed hydrology modeling, Idaho, USA

    NASA Astrophysics Data System (ADS)

    Flores, A. N.; Smith, K.; LaPorte, P.

    2011-12-01

    Applications like flood forecasting, military trafficability assessment, and slope stability analysis necessitate the use of models capable of resolving hydrologic states and fluxes at spatial scales of hillslopes (e.g., 10s to 100s m). These models typically require precipitation forcings at spatial scales of kilometers or better and time intervals of hours. Yet in especially rugged terrain that typifies much of the Western US and throughout much of the developing world, precipitation data at these spatiotemporal resolutions is difficult to come by. Ground-based weather radars have significant problems in high-relief settings and are sparsely located, leaving significant gaps in coverage and high uncertainties. Precipitation gages provide accurate data at points but are very sparsely located and their placement is often not representative, yielding significant coverage gaps in a spatial and physiographic sense. Numerical weather prediction efforts have made precipitation data, including critically important information on precipitation phase, available globally and in near real-time. However, these datasets present watershed modelers with two problems: (1) spatial scales of many of these datasets are tens of kilometers or coarser, (2) numerical weather models used to generate these datasets include a land surface parameterization that in some circumstances can significantly affect precipitation predictions. We report on the development of a regional precipitation dataset for Idaho that leverages: (1) a dataset derived from a numerical weather prediction model, (2) gages within Idaho that report hourly precipitation data, and (3) a long-term precipitation climatology dataset. Hourly precipitation estimates from the Modern Era Retrospective-analysis for Research and Applications (MERRA) are stochastically downscaled using a hybrid orographic and statistical model from their native resolution (1/2 x 2/3 degrees) to a resolution of approximately 1 km. Downscaled precipitation realizations are conditioned on hourly observations from reporting gages and then conditioned again on the Parameter-elevation Regressions on Independent Slopes Model (PRISM) at the monthly timescale to reflect orographic precipitation trends common to watersheds of the Western US. While this methodology potentially introduces cross-pollination of errors due to the re-use of precipitation gage data, it nevertheless achieves an ensemble-based precipitation estimate and appropriate measures of uncertainty at a spatiotemporal resolution appropriate for watershed modeling.

  9. High-dimensional statistical inference: From vector to matrix

    NASA Astrophysics Data System (ADS)

    Zhang, Anru

    Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  10. Sparse Matrices in MATLAB: Design and Implementation

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Moler, Cleve; Schreiber, Robert

    1992-01-01

    The matrix computation language and environment MATLAB is extended to include sparse matrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportional to the number of arithmetic operations on nonzeros.

  11. Downscaling ocean conditions: Experiments with a quasi-geostrophic model

    NASA Astrophysics Data System (ADS)

    Katavouta, A.; Thompson, K. R.

    2013-12-01

    The predictability of small-scale ocean variability, given the time history of the associated large-scales, is investigated using a quasi-geostrophic model of two wind-driven gyres separated by an unstable, mid-ocean jet. Motivated by the recent theoretical study of Henshaw et al. (2003), we propose a straightforward method for assimilating information on the large-scale in order to recover the small-scale details of the quasi-geostrophic circulation. The similarity of this method to the spectral nudging of limited area atmospheric models is discussed. Results from the spectral nudging of the quasi-geostrophic model, and an independent multivariate regression-based approach, show that important features of the ocean circulation, including the position of the meandering mid-ocean jet and the associated pinch-off eddies, can be recovered from the time history of a small number of large-scale modes. We next propose a hybrid approach for assimilating both the large-scales and additional observed time series from a limited number of locations that alone are too sparse to recover the small scales using traditional assimilation techniques. The hybrid approach improved significantly the recovery of the small-scales. The results highlight the importance of the coupling between length scales in downscaling applications, and the value of assimilating limited point observations after the large-scales have been set correctly. The application of the hybrid and spectral nudging to practical ocean forecasting, and projecting changes in ocean conditions on climate time scales, is discussed briefly.

  12. The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images.

    PubMed

    Loxley, P N

    2017-10-01

    The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.

  13. pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data.

    PubMed

    Cao, Huojun; Amendt, Brad A

    2016-11-01

    Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.

  14. Development of Innovative Technology to Provide Low-Cost Surface Atmospheric Observations in Data Sparse Regions

    NASA Astrophysics Data System (ADS)

    Kucera, Paul; Steinson, Martin

    2017-04-01

    Accurate and reliable real-time monitoring and dissemination of observations of surface weather conditions is critical for a variety of societal applications. Applications that provide local and regional information about temperature, precipitation, moisture, and winds, for example, are important for agriculture, water resource monitoring, health, and monitoring of hazard weather conditions. In many regions of the World, surface weather stations are sparsely located and/or of poor quality. Existing stations have often been sited incorrectly, not well-maintained, and have limited communications established at the site for real-time monitoring. The University Corporation for Atmospheric Research (UCAR)/National Center for Atmospheric Research (NCAR), with support from USAID, has started an initiative to develop and deploy low-cost weather instrumentation in sparsely observed regions of the world. The project is focused on improving weather observations for environmental monitoring and early warning alert systems on a regional to global scale. Instrumentation that has been developed use innovative new technologies such as 3D printers, Raspberry Pi computing systems, and wireless communications. The goal of the project is to make the weather station designs, software, and processing tools an open community resource. The weather stations can be built locally by agencies, through educational institutions, and residential communities as a citizen effort to augment existing networks to improve detection of natural hazards for disaster risk reduction. The presentation will provide an overview of the open source weather station technology and evaluation of sensor observations for the initial networks that have been deployed in Africa.

  15. Blind source separation by sparse decomposition

    NASA Astrophysics Data System (ADS)

    Zibulevsky, Michael; Pearlmutter, Barak A.

    2000-04-01

    The blind source separation problem is to extract the underlying source signals from a set of their linear mixtures, where the mixing matrix is unknown. This situation is common, eg in acoustics, radio, and medical signal processing. We exploit the property of the sources to have a sparse representation in a corresponding signal dictionary. Such a dictionary may consist of wavelets, wavelet packets, etc., or be obtained by learning from a given family of signals. Starting from the maximum a posteriori framework, which is applicable to the case of more sources than mixtures, we derive a few other categories of objective functions, which provide faster and more robust computations, when there are an equal number of sources and mixtures. Our experiments with artificial signals and with musical sounds demonstrate significantly better separation than other known techniques.

  16. Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling.

    PubMed

    Wu, Hulin; Lu, Tao; Xue, Hongqi; Liang, Hua

    2014-04-02

    The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.

  17. Storage of sparse files using parallel log-structured file system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bent, John M.; Faibish, Sorin; Grider, Gary

    A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a singlemore » patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.« less

  18. A compressed sensing X-ray camera with a multilayer architecture

    DOE PAGES

    Wang, Zhehui; Laroshenko, O.; Li, S.; ...

    2018-01-25

    Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. In this work, wemore » first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less

  19. Identification-While-Scanning of a Multi-Aircraft Formation Based on Sparse Recovery for Narrowband Radar.

    PubMed

    Jiang, Yuan; Xu, Jia; Peng, Shi-Bao; Mao, Er-Ke; Long, Teng; Peng, Ying-Ning

    2016-11-23

    It is known that the identification performance of a multi-aircraft formation (MAF) of narrowband radar mainly depends on the time on target (TOT). To realize the identification task in one rotated scan with limited TOT, the paper proposes a novel identification-while-scanning (IWS) method based on sparse recovery to maintain high rotating speed and super-resolution for MAF identification, simultaneously. First, a multiple chirp signal model is established for MAF in a single scan, where different aircraft may have different Doppler centers and Doppler rates. Second, based on the sparsity of MAF in the Doppler parameter space, a novel hierarchical basis pursuit (HBP) method is proposed to obtain satisfactory sparse recovery performance as well as high computational efficiency. Furthermore, the parameter estimation performance of the proposed IWS identification method is analyzed with respect to recovery condition, signal-to-noise ratio and TOT. It is shown that an MAF can be effectively identified via HBP with a TOT of only about one hundred microseconds for IWS applications. Finally, some numerical experiment results are provided to demonstrate the effectiveness of the proposed method based on both simulated and real measured data.

  20. Large Covariance Estimation by Thresholding Principal Orthogonal Complements

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2012-01-01

    This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. PMID:24348088

  1. Large Covariance Estimation by Thresholding Principal Orthogonal Complements.

    PubMed

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2013-09-01

    This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

  2. Multi-subject hierarchical inverse covariance modelling improves estimation of functional brain networks.

    PubMed

    Colclough, Giles L; Woolrich, Mark W; Harrison, Samuel J; Rojas López, Pedro A; Valdes-Sosa, Pedro A; Smith, Stephen M

    2018-05-07

    A Bayesian model for sparse, hierarchical, inver-covariance estimation is presented, and applied to multi-subject functional connectivity estimation in the human brain. It enables simultaneous inference of the strength of connectivity between brain regions at both subject and population level, and is applicable to fMRI, MEG and EEG data. Two versions of the model can encourage sparse connectivity, either using continuous priors to suppress irrelevant connections, or using an explicit description of the network structure to estimate the connection probability between each pair of regions. A large evaluation of this model, and thirteen methods that represent the state of the art of inverse covariance modelling, is conducted using both simulated and resting-state functional imaging datasets. Our novel Bayesian approach has similar performance to the best extant alternative, Ng et al.'s Sparse Group Gaussian Graphical Model algorithm, which also is based on a hierarchical structure. Using data from the Human Connectome Project, we show that these hierarchical models are able to reduce the measurement error in MEG beta-band functional networks by 10%, producing concomitant increases in estimates of the genetic influence on functional connectivity. Copyright © 2018. Published by Elsevier Inc.

  3. Compressed sensing for energy-efficient wireless telemonitoring of noninvasive fetal ECG via block sparse Bayesian learning.

    PubMed

    Zhang, Zhilin; Jung, Tzyy-Ping; Makeig, Scott; Rao, Bhaskar D

    2013-02-01

    Fetal ECG (FECG) telemonitoring is an important branch in telemedicine. The design of a telemonitoring system via a wireless body area network with low energy consumption for ambulatory use is highly desirable. As an emerging technique, compressed sensing (CS) shows great promise in compressing/reconstructing data with low energy consumption. However, due to some specific characteristics of raw FECG recordings such as nonsparsity and strong noise contamination, current CS algorithms generally fail in this application. This paper proposes to use the block sparse Bayesian learning framework to compress/reconstruct nonsparse raw FECG recordings. Experimental results show that the framework can reconstruct the raw recordings with high quality. Especially, the reconstruction does not destroy the interdependence relation among the multichannel recordings. This ensures that the independent component analysis decomposition of the reconstructed recordings has high fidelity. Furthermore, the framework allows the use of a sparse binary sensing matrix with much fewer nonzero entries to compress recordings. Particularly, each column of the matrix can contain only two nonzero entries. This shows that the framework, compared to other algorithms such as current CS algorithms and wavelet algorithms, can greatly reduce code execution in CPU in the data compression stage.

  4. Sparse signals recovered by non-convex penalty in quasi-linear systems.

    PubMed

    Cui, Angang; Li, Haiyang; Wen, Meng; Peng, Jigen

    2018-01-01

    The goal of compressed sensing is to reconstruct a sparse signal under a few linear measurements far less than the dimension of the ambient space of the signal. However, many real-life applications in physics and biomedical sciences carry some strongly nonlinear structures, and the linear model is no longer suitable. Compared with the compressed sensing under the linear circumstance, this nonlinear compressed sensing is much more difficult, in fact also NP-hard, combinatorial problem, because of the discrete and discontinuous nature of the [Formula: see text]-norm and the nonlinearity. In order to get a convenience for sparse signal recovery, we set the nonlinear models have a smooth quasi-linear nature in this paper, and study a non-convex fraction function [Formula: see text] in this quasi-linear compressed sensing. We propose an iterative fraction thresholding algorithm to solve the regularization problem [Formula: see text] for all [Formula: see text]. With the change of parameter [Formula: see text], our algorithm could get a promising result, which is one of the advantages for our algorithm compared with some state-of-art algorithms. Numerical experiments show that our method performs much better than some state-of-the-art methods.

  5. FDD Massive MIMO Channel Estimation With Arbitrary 2D-Array Geometry

    NASA Astrophysics Data System (ADS)

    Dai, Jisheng; Liu, An; Lau, Vincent K. N.

    2018-05-01

    This paper addresses the problem of downlink channel estimation in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems. The existing methods usually exploit hidden sparsity under a discrete Fourier transform (DFT) basis to estimate the cdownlink channel. However, there are at least two shortcomings of these DFT-based methods: 1) they are applicable to uniform linear arrays (ULAs) only, since the DFT basis requires a special structure of ULAs, and 2) they always suffer from a performance loss due to the leakage of energy over some DFT bins. To deal with the above shortcomings, we introduce an off-grid model for downlink channel sparse representation with arbitrary 2D-array antenna geometry, and propose an efficient sparse Bayesian learning (SBL) approach for the sparse channel recovery and off-grid refinement. The main idea of the proposed off-grid method is to consider the sampled grid points as adjustable parameters. Utilizing an in-exact block majorization-minimization (MM) algorithm, the grid points are refined iteratively to minimize the off-grid gap. Finally, we further extend the solution to uplink-aided channel estimation by exploiting the angular reciprocity between downlink and uplink channels, which brings enhanced recovery performance.

  6. Technical note: an R package for fitting sparse neural networks with application in animal breeding.

    PubMed

    Wang, Yangfan; Mi, Xue; Rosa, Guilherme J M; Chen, Zhihui; Lin, Ping; Wang, Shi; Bao, Zhenmin

    2018-05-04

    Neural networks (NNs) have emerged as a new tool for genomic selection (GS) in animal breeding. However, the properties of NN used in GS for the prediction of phenotypic outcomes are not well characterized due to the problem of over-parameterization of NN and difficulties in using whole-genome marker sets as high-dimensional NN input. In this note, we have developed an R package called snnR that finds an optimal sparse structure of a NN by minimizing the square error subject to a penalty on the L1-norm of the parameters (weights and biases), therefore solving the problem of over-parameterization in NN. We have also tested some models fitted in the snnR package to demonstrate their feasibility and effectiveness to be used in several cases as examples. In comparison of snnR to the R package brnn (the Bayesian regularized single layer NNs), with both using the entries of a genotype matrix or a genomic relationship matrix as inputs, snnR has greatly improved the computational efficiency and the prediction ability for the GS in animal breeding because snnR implements a sparse NN with many hidden layers.

  7. A Non-destructive Terahertz Spectroscopy-Based Method for Transgenic Rice Seed Discrimination via Sparse Representation

    NASA Astrophysics Data System (ADS)

    Hu, Xiaohua; Lang, Wenhui; Liu, Wei; Xu, Xue; Yang, Jianbo; Zheng, Lei

    2017-08-01

    Terahertz (THz) spectroscopy technique has been researched and developed for rapid and non-destructive detection of food safety and quality due to its low-energy and non-ionizing characteristics. The objective of this study was to develop a flexible identification model to discriminate transgenic and non-transgenic rice seeds based on terahertz (THz) spectroscopy. To extract THz spectral features and reduce the feature dimension, sparse representation (SR) is employed in this work. A sufficient sparsity level is selected to train the sparse coding of the THz data, and the random forest (RF) method is then applied to obtain a discrimination model. The results show that there exist differences between transgenic and non-transgenic rice seeds in THz spectral band and, comparing with Least squares support vector machines (LS-SVM) method, SR-RF is a better model for discrimination (accuracy is 95% in prediction set, 100% in calibration set, respectively). The conclusion is that SR may be more useful in the application of THz spectroscopy to reduce dimension and the SR-RF provides a new, effective, and flexible method for detection and identification of transgenic and non-transgenic rice seeds with THz spectral system.

  8. Anomaly Detection in Moving-Camera Video Sequences Using Principal Subspace Analysis

    DOE PAGES

    Thomaz, Lucas A.; Jardim, Eric; da Silva, Allan F.; ...

    2017-10-16

    This study presents a family of algorithms based on sparse decompositions that detect anomalies in video sequences obtained from slow moving cameras. These algorithms start by computing the union of subspaces that best represents all the frames from a reference (anomaly free) video as a low-rank projection plus a sparse residue. Then, they perform a low-rank representation of a target (possibly anomalous) video by taking advantage of both the union of subspaces and the sparse residue computed from the reference video. Such algorithms provide good detection results while at the same time obviating the need for previous video synchronization. However,more » this is obtained at the cost of a large computational complexity, which hinders their applicability. Another contribution of this paper approaches this problem by using intrinsic properties of the obtained data representation in order to restrict the search space to the most relevant subspaces, providing computational complexity gains of up to two orders of magnitude. The developed algorithms are shown to cope well with videos acquired in challenging scenarios, as verified by the analysis of 59 videos from the VDAO database that comprises videos with abandoned objects in a cluttered industrial scenario.« less

  9. Anomaly Detection in Moving-Camera Video Sequences Using Principal Subspace Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thomaz, Lucas A.; Jardim, Eric; da Silva, Allan F.

    This study presents a family of algorithms based on sparse decompositions that detect anomalies in video sequences obtained from slow moving cameras. These algorithms start by computing the union of subspaces that best represents all the frames from a reference (anomaly free) video as a low-rank projection plus a sparse residue. Then, they perform a low-rank representation of a target (possibly anomalous) video by taking advantage of both the union of subspaces and the sparse residue computed from the reference video. Such algorithms provide good detection results while at the same time obviating the need for previous video synchronization. However,more » this is obtained at the cost of a large computational complexity, which hinders their applicability. Another contribution of this paper approaches this problem by using intrinsic properties of the obtained data representation in order to restrict the search space to the most relevant subspaces, providing computational complexity gains of up to two orders of magnitude. The developed algorithms are shown to cope well with videos acquired in challenging scenarios, as verified by the analysis of 59 videos from the VDAO database that comprises videos with abandoned objects in a cluttered industrial scenario.« less

  10. Random On-Board Pixel Sampling (ROPS) X-Ray Camera

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Zhehui; Iaroshenko, O.; Li, S.

    Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. Here we first illustratemore » the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less

  11. Joint seismic data denoising and interpolation with double-sparsity dictionary learning

    NASA Astrophysics Data System (ADS)

    Zhu, Lingchen; Liu, Entao; McClellan, James H.

    2017-08-01

    Seismic data quality is vital to geophysical applications, so that methods of data recovery, including denoising and interpolation, are common initial steps in the seismic data processing flow. We present a method to perform simultaneous interpolation and denoising, which is based on double-sparsity dictionary learning. This extends previous work that was for denoising only. The original double-sparsity dictionary learning algorithm is modified to track the traces with missing data by defining a masking operator that is integrated into the sparse representation of the dictionary. A weighted low-rank approximation algorithm is adopted to handle the dictionary updating as a sparse recovery optimization problem constrained by the masking operator. Compared to traditional sparse transforms with fixed dictionaries that lack the ability to adapt to complex data structures, the double-sparsity dictionary learning method learns the signal adaptively from selected patches of the corrupted seismic data, while preserving compact forward and inverse transform operators. Numerical experiments on synthetic seismic data indicate that this new method preserves more subtle features in the data set without introducing pseudo-Gibbs artifacts when compared to other directional multi-scale transform methods such as curvelets.

  12. A compressed sensing X-ray camera with a multilayer architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Zhehui; Laroshenko, O.; Li, S.

    Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. In this work, wemore » first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less

  13. Computational efficiency improvements for image colorization

    NASA Astrophysics Data System (ADS)

    Yu, Chao; Sharma, Gaurav; Aly, Hussein

    2013-03-01

    We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.

  14. Learning Sparse Feature Representations using Probabilistic Quadtrees and Deep Belief Nets

    DTIC Science & Technology

    2015-04-24

    Feature Representations usingProbabilistic Quadtrees and Deep Belief Nets Learning sparse feature representations is a useful instru- ment for solving an...novel framework for the classifi cation of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets... Learning Sparse Feature Representations usingProbabilistic Quadtrees and Deep Belief Nets Report Title Learning sparse feature representations is a useful

  15. Joint association discovery and diagnosis of Alzheimer's disease by supervised heterogeneous multiview learning.

    PubMed

    Zhe, Shandian; Xu, Zenglin; Qi, Yuan; Yu, Peng

    2014-01-01

    A key step for Alzheimer's disease (AD) study is to identify associations between genetic variations and intermediate phenotypes (e.g., brain structures). At the same time, it is crucial to develop a noninvasive means for AD diagnosis. Although these two tasks-association discovery and disease diagnosis-have been treated separately by a variety of approaches, they are tightly coupled due to their common biological basis. We hypothesize that the two tasks can potentially benefit each other by a joint analysis, because (i) the association study discovers correlated biomarkers from different data sources, which may help improve diagnosis accuracy, and (ii) the disease status may help identify disease-sensitive associations between genetic variations and MRI features. Based on this hypothesis, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal the associations but also select groups of biomarkers related to AD. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset of AD. Our joint analysis approach not only identifies meaningful and interesting associations between genetic variations, brain structures, and AD status, but also achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.

  16. Application of the Envelope Difference Index to Spectrally Sparse Speech

    ERIC Educational Resources Information Center

    Souza, Pamela; Hoover, Eric; Gallun, Frederick

    2012-01-01

    Purpose: Amplitude compression is a common hearing aid processing strategy that can improve speech audibility and loudness comfort but also has the potential to alter important cues carried by the speech envelope. In previous work, a measure of envelope change, the Envelope Difference Index (EDI; Fortune, Woodruff, & Preves, 1994), was moderately…

  17. Effect of Atmospheric Conditions on Coverage of Fogger Applications in a Desert Surface Boundary Layer

    DTIC Science & Technology

    2012-01-01

    2007) in pecan orchards, among others. In this sparse shrub desert environment, Nappo et al. (2010) showed that, during sta- ble conditions, the...measuring water use in flood-irrigated pecans (Carya illinoinensis). Agric. Water Mgmt. 88(1-3): 181-191. Solanelles, F., E. Gregorio, R. Sanz, J. R

  18. APPLICATION OF THE MODELS-3 COMMUNITY MULTI-SCALE AIR QUALITY (CMAQ) MODEL SYSTEM TO SOS/NASHVILLE 1999

    EPA Science Inventory

    The Models-3 Community Multi-scale Air Quality (CMAQ) model, first released by the USEPA in 1999 (Byun and Ching. 1999), continues to be developed and evaluated. The principal components of the CMAQ system include a comprehensive emission processor known as the Sparse Matrix O...

  19. The shared and unique values of optical, fluorescence, thermal and microwave satellite data for estimating large-scale crop yields

    USDA-ARS?s Scientific Manuscript database

    Large-scale crop monitoring and yield estimation are important for both scientific research and practical applications. Satellite remote sensing provides an effective means for regional and global cropland monitoring, particularly in data-sparse regions that lack reliable ground observations and rep...

  20. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.

  1. Convolutional Sparse Coding for RGB+NIR Imaging.

    PubMed

    Hu, Xuemei; Heide, Felix; Dai, Qionghai; Wetzstein, Gordon

    2018-04-01

    Emerging sensor designs increasingly rely on novel color filter arrays (CFAs) to sample the incident spectrum in unconventional ways. In particular, capturing a near-infrared (NIR) channel along with conventional RGB color is an exciting new imaging modality. RGB+NIR sensing has broad applications in computational photography, such as low-light denoising, it has applications in computer vision, such as facial recognition and tracking, and it paves the way toward low-cost single-sensor RGB and depth imaging using structured illumination. However, cost-effective commercial CFAs suffer from severe spectral cross talk. This cross talk represents a major challenge in high-quality RGB+NIR imaging, rendering existing spatially multiplexed sensor designs impractical. In this work, we introduce a new approach to RGB+NIR image reconstruction using learned convolutional sparse priors. We demonstrate high-quality color and NIR imaging for challenging scenes, even including high-frequency structured NIR illumination. The effectiveness of the proposed method is validated on a large data set of experimental captures, and simulated benchmark results which demonstrate that this work achieves unprecedented reconstruction quality.

  2. Simple adaptive sparse representation based classification schemes for EEG based brain-computer interface applications.

    PubMed

    Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No

    2015-11-01

    One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Optimal parallel solution of sparse triangular systems

    NASA Technical Reports Server (NTRS)

    Alvarado, Fernando L.; Schreiber, Robert

    1990-01-01

    A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.

  4. Sparse Reconstruction for Temperature Distribution Using DTS Fiber Optic Sensors with Applications in Electrical Generator Stator Monitoring.

    PubMed

    Bazzo, João Paulo; Pipa, Daniel Rodrigues; da Silva, Erlon Vagner; Martelli, Cicero; Cardozo da Silva, Jean Carlos

    2016-09-07

    This paper presents an image reconstruction method to monitor the temperature distribution of electric generator stators. The main objective is to identify insulation failures that may arise as hotspots in the structure. The method is based on temperature readings of fiber optic distributed sensors (DTS) and a sparse reconstruction algorithm. Thermal images of the structure are formed by appropriately combining atoms of a dictionary of hotspots, which was constructed by finite element simulation with a multi-physical model. Due to difficulties for reproducing insulation faults in real stator structure, experimental tests were performed using a prototype similar to the real structure. The results demonstrate the ability of the proposed method to reconstruct images of hotspots with dimensions down to 15 cm, representing a resolution gain of up to six times when compared to the DTS spatial resolution. In addition, satisfactory results were also obtained to detect hotspots with only 5 cm. The application of the proposed algorithm for thermal imaging of generator stators can contribute to the identification of insulation faults in early stages, thereby avoiding catastrophic damage to the structure.

  5. User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.

    NASA Technical Reports Server (NTRS)

    Reddy, C. J.

    2000-01-01

    PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.

  6. Spatial interpolation schemes of daily precipitation for hydrologic modeling

    USGS Publications Warehouse

    Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.

    2012-01-01

    Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.

  7. Vegetation dynamics and responses to climate change and human activities in Central Asia.

    PubMed

    Jiang, Liangliang; Guli Jiapaer; Bao, Anming; Guo, Hao; Ndayisaba, Felix

    2017-12-01

    Knowledge of the current changes and dynamics of different types of vegetation in relation to climatic changes and anthropogenic activities is critical for developing adaptation strategies to address the challenges posed by climate change and human activities for ecosystems. Based on a regression analysis and the Hurst exponent index method, this research investigated the spatial and temporal characteristics and relationships between vegetation greenness and climatic factors in Central Asia using the Normalized Difference Vegetation Index (NDVI) and gridded high-resolution station (land) data for the period 1984-2013. Further analysis distinguished between the effects of climatic change and those of human activities on vegetation dynamics by means of a residual analysis trend method. The results show that vegetation pixels significantly decreased for shrubs and sparse vegetation compared with those for the other vegetation types and that the degradation of sparse vegetation was more serious in the Karakum and Kyzylkum Deserts, the Ustyurt Plateau and the wetland delta of the Large Aral Sea than in other regions. The Hurst exponent results indicated that forests are more sustainable than grasslands, shrubs and sparse vegetation. Precipitation is the main factor affecting vegetation growth in the Kazakhskiy Melkosopochnik. Moreover, temperature is a controlling factor that influences the seasonal variation of vegetation greenness in the mountains and the Aral Sea basin. Drought is the main factor affecting vegetation degradation as a result of both increased temperature and decreased precipitation in the Kyzylkum Desert and the northern Ustyurt Plateau. The residual analysis highlighted that sparse vegetation and the degradation of some shrubs in the southern part of the Karakum Desert, the southern Ustyurt Plateau and the wetland delta of the Large Aral Sea were mainly triggered by human activities: the excessive exploitation of water resources in the upstream areas of the Amu Darya basin and oil and natural gas extraction in the southern part of the Karakum Desert and the southern Ustyurt Plateau. The results also indicated that after the collapse of the Soviet Union, abandoned pastures gave rise to increased vegetation in eastern Kazakhstan, Kyrgyzstan and Tajikistan, and abandoned croplands reverted to grasslands in northern Kazakhstan, leading to a decrease in cropland greenness. Shrubs and sparse vegetation were extremely sensitive to short-term climatic variations, and our results demonstrated that these vegetation types were the most seriously degraded by human activities. Therefore, regional governments should strive to restore vegetation to sustain this fragile arid ecological environment. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. [Population density, age distribution and urbanisation as factors influencing the frequency of home visits--an analysis for Mecklenburg-West Pomerania].

    PubMed

    Heymann, R; Weitmann, K; Weiss, S; Thierfelder, D; Flessa, S; Hoffmann, W

    2009-07-01

    This study examines and compares the frequency of home visits by general practitioners in regions with a lower population density and regions with a higher population density. The discussion centres on the hypothesis whether the number of home visits in rural and remote areas with a low population density is, in fact, higher than in urbanised areas with a higher population density. The average age of the population has been considered in both cases. The communities of Mecklenburg West-Pomerania were aggregated into postal code regions. The analysis is based on these postal code regions. The average frequency of home visits per 100 inhabitants/km2 has been calculated via a bivariate, linear regression model with the population density and the average age for the postal code region as independent variables. The results are based on billing data of the year 2006 as provided by the Association of Statutory Health Insurance Physicians of Mecklenburg-Western Pomerania. In a second step a variable which clustered the postal codes of urbanised areas was added to a multivariate model. The hypothesis of a negative correlation between the frequency of home visits and the population density of the areas examined cannot be confirmed for Mecklenburg-Western Pomerania. Following the dichotomisation of the postal code regions into sparsely and densely populated areas, only the very sparsely populated postal code regions (less than 100 inhabitants/km2) show a tendency towards a higher frequency of home visits. Overall, the frequency of home visits in sparsely populated postal code regions is 28.9% higher than in the densely populated postal code regions (more than 100 inhabitants/km2), although the number of general practitioners is approximately the same in both groups. In part this association seems to be confirmed by a positive correlation between the average age in the individual postal code regions and the number of home visits carried out in the area. As calculated on the basis of the data at hand, only the very sparsely populated areas with a still gradually decreasing population show a tendency towards a higher frequency of home visits. According to the data of 2006, the number of home visits remains high in sparsely populated areas. It may increase in the near future as the number of general practitioners in these areas will gradually decrease while the number of immobile and older inhabitants will increase.

  9. Color normalization of histology slides using graph regularized sparse NMF

    NASA Astrophysics Data System (ADS)

    Sha, Lingdao; Schonfeld, Dan; Sethi, Amit

    2017-03-01

    Computer based automatic medical image processing and quantification are becoming popular in digital pathology. However, preparation of histology slides can vary widely due to differences in staining equipment, procedures and reagents, which can reduce the accuracy of algorithms that analyze their color and texture information. To re- duce the unwanted color variations, various supervised and unsupervised color normalization methods have been proposed. Compared with supervised color normalization methods, unsupervised color normalization methods have advantages of time and cost efficient and universal applicability. Most of the unsupervised color normaliza- tion methods for histology are based on stain separation. Based on the fact that stain concentration cannot be negative and different parts of the tissue absorb different stains, nonnegative matrix factorization (NMF), and particular its sparse version (SNMF), are good candidates for stain separation. However, most of the existing unsupervised color normalization method like PCA, ICA, NMF and SNMF fail to consider important information about sparse manifolds that its pixels occupy, which could potentially result in loss of texture information during color normalization. Manifold learning methods like Graph Laplacian have proven to be very effective in interpreting high-dimensional data. In this paper, we propose a novel unsupervised stain separation method called graph regularized sparse nonnegative matrix factorization (GSNMF). By considering the sparse prior of stain concentration together with manifold information from high-dimensional image data, our method shows better performance in stain color deconvolution than existing unsupervised color deconvolution methods, especially in keeping connected texture information. To utilized the texture information, we construct a nearest neighbor graph between pixels within a spatial area of an image based on their distances using heat kernal in lαβ space. The representation of a pixel in the stain density space is constrained to follow the feature distance of the pixel to pixels in the neighborhood graph. Utilizing color matrix transfer method with the stain concentrations found using our GSNMF method, the color normalization performance was also better than existing methods.

  10. Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

    NASA Astrophysics Data System (ADS)

    Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

    2011-12-01

    Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.

  11. On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint

    PubMed Central

    Zhang, Chong; Liu, Yufeng; Wu, Yichao

    2015-01-01

    For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint. PMID:27134575

  12. Data-driven discovery of partial differential equations.

    PubMed

    Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2017-04-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.

  13. Local structure preserving sparse coding for infrared target recognition

    PubMed Central

    Han, Jing; Yue, Jiang; Zhang, Yi; Bai, Lianfa

    2017-01-01

    Sparse coding performs well in image classification. However, robust target recognition requires a lot of comprehensive template images and the sparse learning process is complex. We incorporate sparsity into a template matching concept to construct a local sparse structure matching (LSSM) model for general infrared target recognition. A local structure preserving sparse coding (LSPSc) formulation is proposed to simultaneously preserve the local sparse and structural information of objects. By adding a spatial local structure constraint into the classical sparse coding algorithm, LSPSc can improve the stability of sparse representation for targets and inhibit background interference in infrared images. Furthermore, a kernel LSPSc (K-LSPSc) formulation is proposed, which extends LSPSc to the kernel space to weaken the influence of the linear structure constraint in nonlinear natural data. Because of the anti-interference and fault-tolerant capabilities, both LSPSc- and K-LSPSc-based LSSM can implement target identification based on a simple template set, which just needs several images containing enough local sparse structures to learn a sufficient sparse structure dictionary of a target class. Specifically, this LSSM approach has stable performance in the target detection with scene, shape and occlusions variations. High performance is demonstrated on several datasets, indicating robust infrared target recognition in diverse environments and imaging conditions. PMID:28323824

  14. Multi-level discriminative dictionary learning with application to large scale image classification.

    PubMed

    Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua

    2015-10-01

    The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.

  15. The Use of Sparse Direct Solver in Vector Finite Element Modeling for Calculating Two Dimensional (2-D) Magnetotelluric Responses in Transverse Electric (TE) Mode

    NASA Astrophysics Data System (ADS)

    Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.

    2018-04-01

    The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.

  16. Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.

    PubMed

    Lam, Clifford; Fan, Jianqing

    2009-01-01

    This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

  17. A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

    DOE PAGES

    Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

    2017-06-01

    As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less

  18. Connectivity strength-weighted sparse group representation-based brain network construction for MCI classification.

    PubMed

    Yu, Renping; Zhang, Han; An, Le; Chen, Xiaobo; Wei, Zhihui; Shen, Dinggang

    2017-05-01

    Brain functional network analysis has shown great potential in understanding brain functions and also in identifying biomarkers for brain diseases, such as Alzheimer's disease (AD) and its early stage, mild cognitive impairment (MCI). In these applications, accurate construction of biologically meaningful brain network is critical. Sparse learning has been widely used for brain network construction; however, its l 1 -norm penalty simply penalizes each edge of a brain network equally, without considering the original connectivity strength which is one of the most important inherent linkwise characters. Besides, based on the similarity of the linkwise connectivity, brain network shows prominent group structure (i.e., a set of edges sharing similar attributes). In this article, we propose a novel brain functional network modeling framework with a "connectivity strength-weighted sparse group constraint." In particular, the network modeling can be optimized by considering both raw connectivity strength and its group structure, without losing the merit of sparsity. Our proposed method is applied to MCI classification, a challenging task for early AD diagnosis. Experimental results based on the resting-state functional MRI, from 50 MCI patients and 49 healthy controls, show that our proposed method is more effective (i.e., achieving a significantly higher classification accuracy, 84.8%) than other competing methods (e.g., sparse representation, accuracy = 65.6%). Post hoc inspection of the informative features further shows more biologically meaningful brain functional connectivities obtained by our proposed method. Hum Brain Mapp 38:2370-2383, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  19. Deploying temporary networks for upscaling of sparse network stations

    NASA Astrophysics Data System (ADS)

    Coopersmith, Evan J.; Cosh, Michael H.; Bell, Jesse E.; Kelly, Victoria; Hall, Mark; Palecki, Michael A.; Temimi, Marouane

    2016-10-01

    Soil observations networks at the national scale play an integral role in hydrologic modeling, drought assessment, agricultural decision support, and our ability to understand climate change. Understanding soil moisture variability is necessary to apply these measurements to model calibration, business and consumer applications, or even human health issues. The installation of soil moisture sensors as sparse, national networks is necessitated by limited financial resources. However, this results in the incomplete sampling of the local heterogeneity of soil type, vegetation cover, topography, and the fine spatial distribution of precipitation events. To this end, temporary networks can be installed in the areas surrounding a permanent installation within a sparse network. The temporary networks deployed in this study provide a more representative average at the 3 km and 9 km scales, localized about the permanent gauge. The value of such temporary networks is demonstrated at test sites in Millbrook, New York and Crossville, Tennessee. The capacity of a single U.S. Climate Reference Network (USCRN) sensor set to approximate the average of a temporary network at the 3 km and 9 km scales using a simple linear scaling function is tested. The capacity of a temporary network to provide reliable estimates with diminishing numbers of sensors, the temporal stability of those networks, and ultimately, the relationship of the variability of those networks to soil moisture conditions at the permanent sensor are investigated. In this manner, this work demonstrates the single-season installation of a temporary network as a mechanism to characterize the soil moisture variability at a permanent gauge within a sparse network.

  20. Dictionary Pair Learning on Grassmann Manifolds for Image Denoising.

    PubMed

    Zeng, Xianhua; Bian, Wei; Liu, Wei; Shen, Jialie; Tao, Dacheng

    2015-11-01

    Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert 2D image patches into 1D vectors for further processing. Thus, these methods inevitably break down the inherent 2D geometric structure of natural images. To overcome this limitation pertaining to the previous image denoising methods, we propose a 2D image denoising model, namely, the dictionary pair learning (DPL) model, and we design a corresponding algorithm called the DPL on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove the noise. Consequently, the DPLG algorithm not only preserves the inherent 2D geometric structure of natural images but also performs manifold smoothing in the 2D sparse coding space. We demonstrate that the DPLG algorithm also improves the structural SIMilarity values of the perceptual visual quality for denoised images using the experimental evaluations on the benchmark images and Berkeley segmentation data sets. Moreover, the DPLG also produces the competitive peak signal-to-noise ratio values from popular image denoising algorithms.

  1. A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

    As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less

  2. A sparse reconstruction method for the estimation of multiresolution emission fields via atmospheric inversion

    DOE PAGES

    Ray, J.; Lee, J.; Yadav, V.; ...

    2014-08-20

    We present a sparse reconstruction scheme that can also be used to ensure non-negativity when fitting wavelet-based random field models to limited observations in non-rectangular geometries. The method is relevant when multiresolution fields are estimated using linear inverse problems. Examples include the estimation of emission fields for many anthropogenic pollutants using atmospheric inversion or hydraulic conductivity in aquifers from flow measurements. The scheme is based on three new developments. Firstly, we extend an existing sparse reconstruction method, Stagewise Orthogonal Matching Pursuit (StOMP), to incorporate prior information on the target field. Secondly, we develop an iterative method that uses StOMP tomore » impose non-negativity on the estimated field. Finally, we devise a method, based on compressive sensing, to limit the estimated field within an irregularly shaped domain. We demonstrate the method on the estimation of fossil-fuel CO 2 (ffCO 2) emissions in the lower 48 states of the US. The application uses a recently developed multiresolution random field model and synthetic observations of ffCO 2 concentrations from a limited set of measurement sites. We find that our method for limiting the estimated field within an irregularly shaped region is about a factor of 10 faster than conventional approaches. It also reduces the overall computational cost by a factor of two. Further, the sparse reconstruction scheme imposes non-negativity without introducing strong nonlinearities, such as those introduced by employing log-transformed fields, and thus reaps the benefits of simplicity and computational speed that are characteristic of linear inverse problems.« less

  3. WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

    PubMed Central

    Reiss, Philip T.; Huo, Lan; Zhao, Yihong; Kelly, Clare; Ogden, R. Todd

    2016-01-01

    An increasingly important goal of psychiatry is the use of brain imaging data to develop predictive models. Here we present two contributions to statistical methodology for this purpose. First, we propose and compare a set of wavelet-domain procedures for fitting generalized linear models with scalar responses and image predictors: sparse variants of principal component regression and of partial least squares, and the elastic net. Second, we consider assessing the contribution of image predictors over and above available scalar predictors, in particular via permutation tests and an extension of the idea of confounding to the case of functional or image predictors. Using the proposed methods, we assess whether maps of a spontaneous brain activity measure, derived from functional magnetic resonance imaging, can meaningfully predict presence or absence of attention deficit/hyperactivity disorder (ADHD). Our results shed light on the role of confounding in the surprising outcome of the recent ADHD-200 Global Competition, which challenged researchers to develop algorithms for automated image-based diagnosis of the disorder. PMID:27330652

  4. A Robust Shape Reconstruction Method for Facial Feature Point Detection.

    PubMed

    Tan, Shuqiu; Chen, Dongyi; Guo, Chenggang; Huang, Zhiqi

    2017-01-01

    Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods.

  5. Cultural beliefs and clinical breast examination in Hmong American women: the crucial role of modesty.

    PubMed

    Lee, Hee Yun; Vang, Suzanne

    2015-06-01

    Despite grave cancer disparities in Hmong American women, investigation of the group's breast cancer screening behavior is sparse. This study examined how cultural factors are associated with breast cancer screening utilization, specifically clinical breast exam (CBE), in this population. One hundred and sixty-four Hmong American women between ages 18 and 67 were recruited from a large Midwestern metropolitan area with a median age of 28.0 years. Logistic regression was used to assess the association of cultural variables with receipt of CBE. Roughly 73% of Hmong American women reported ever having had a CBE. Logistic regression revealed that endorsing more modest views was the greatest barrier to ever having had a CBE. Age and language preference were also found to be significant predictors of past CBE use. Cultural factors should be considered in developing interventions aimed at promoting breast cancer screening in this population. In particular, Hmong American women who have less English proficiency and are relatively younger should be targeted in breast cancer screening efforts.

  6. Classification of multispectral or hyperspectral satellite imagery using clustering of sparse approximations on sparse representations in learned dictionaries obtained using efficient convolutional sparse coding

    DOEpatents

    Moody, Daniela; Wohlberg, Brendt

    2018-01-02

    An approach for land cover classification, seasonal and yearly change detection and monitoring, and identification of changes in man-made features may use a clustering of sparse approximations (CoSA) on sparse representations in learned dictionaries. The learned dictionaries may be derived using efficient convolutional sparse coding to build multispectral or hyperspectral, multiresolution dictionaries that are adapted to regional satellite image data. Sparse image representations of images over the learned dictionaries may be used to perform unsupervised k-means clustering into land cover categories. The clustering process behaves as a classifier in detecting real variability. This approach may combine spectral and spatial textural characteristics to detect geologic, vegetative, hydrologic, and man-made features, as well as changes in these features over time.

  7. The Broadband Quandary for Rural America. The Main Street Economist: Commentary on the Rural Economy.

    ERIC Educational Resources Information Center

    Staihr, Brian

    High speed data services known as broadband have the potential to make rural areas less isolated and improve the rural quality of life, but physical barriers, sparse population density, and few markets present significant obstacles to their deployment in rural areas. Broadband applications such as e-commerce, distance education, and telemedicine…

  8. Tensor Toolbox for MATLAB v. 3.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kola, Tamara; Bader, Brett W.; Acar Ataman, Evrim NMN

    Tensors (also known as multidimensional arrays or N-way arrays) are used in a variety of applications ranging from chemometrics to network analysis. The Tensor Toolbox provides classes for manipulating dense, sparse, and structured tensors using MATLAB's object-oriented features. It also provides algorithms for tensor decomposition and factorization, algorithms for computing tensor eigenvalues, and methods for visualization of results.

  9. Web 2.0 and Marketing Education: Explanations and Experiential Applications

    ERIC Educational Resources Information Center

    Granitz, Neil; Koernig, Stephen K.

    2011-01-01

    Although both experiential learning and Web 2.0 tools focus on creativity, sharing, and collaboration, sparse research has been published integrating a Web 2.0 paradigm with experiential learning in marketing. In this article, Web 2.0 concepts are explained. Web 2.0 is then positioned as a philosophy that can advance experiential learning through…

  10. Extending Geographic Weights of Evidence Models for Use in Location Based Services

    ERIC Educational Resources Information Center

    Sonwalkar, Mukul Dinkar

    2012-01-01

    This dissertation addresses the use and modeling of spatio-temporal data for the purposes of providing applications for location based services. One of the major issues in dealing with spatio-temporal data for location based services is the availability and sparseness of such data. Other than the hardware costs associated with collecting movement…

  11. Integrating Concept Mapping into Information Systems Education for Meaningful Learning and Assessment

    ERIC Educational Resources Information Center

    Wei, Wei; Yue, Kwok-Bun

    2017-01-01

    Concept map (CM) is a theoretically sound yet easy to learn tool and can be effectively used to represent knowledge. Even though many disciplines have adopted CM as a teaching and learning tool to improve learning effectiveness, its application in IS curriculum is sparse. Meaningful learning happens when one iteratively integrates new concepts and…

  12. Measurement of organizational culture and climate in healthcare.

    PubMed

    Gershon, Robyn R M; Stone, Patricia W; Bakken, Suzanne; Larson, Elaine

    2004-01-01

    Although there is increasing interest in the relationship between organizational constructs and health services outcomes, information on the reliability and validity of the instruments measuring these constructs is sparse. Twelve instruments were identified that may have applicability in measuring organizational constructs in the healthcare setting. The authors describe and characterize these instruments and discuss the implications for nurse administrators.

  13. The Effect of High-Fidelity Cardiopulmonary Resuscitation (CPR) Simulation on Athletic Training Student Knowledge, Confidence, Emotions, and Experiences

    ERIC Educational Resources Information Center

    Tivener, Kristin Ann; Gloe, Donna Sue

    2015-01-01

    Context: High-fidelity simulation is widely used in healthcare for the training and professional education of students though literature of its application to athletic training education remains sparse. Objective: This research attempts to address a wide-range of data. This includes athletic training student knowledge acquisition from…

  14. Sparse Coding and Dictionary Learning Based on the MDL Principle

    DTIC Science & Technology

    2010-10-01

    average bits per pixel obtained was 4.08 bits per pixel ( bpp ), with p = 250 atoms in the final dictionary. We repeated this using `2 instead of Huber...loss, obtaining 4.12 bpp and p = 245. We now show example results obtained with our framework in two very different applications. In both cases we

  15. Goat breeding in the tropics: development and application of the genomic tools in a USAID "Feed the Future" program

    USDA-ARS?s Scientific Manuscript database

    Food production systems in Africa depend heavily on the use of locally adapted animals. These animals are of agricultural, cultural, and economic importance. Goats, in particular, are critical to the small-scale farmer as they are easier to acquire and maintain. Goats act as scavengers in sparse p...

  16. Scanned-probe field-emission studies of vertically aligned carbon nanofibers

    NASA Astrophysics Data System (ADS)

    Merkulov, Vladimir I.; Lowndes, Douglas H.; Baylor, Larry R.

    2001-02-01

    Field emission properties of dense and sparse "forests" of randomly placed, vertically aligned carbon nanofibers (VACNFs) were studied using a scanned probe with a small tip diameter of ˜1 μm. The probe was scanned in directions perpendicular and parallel to the sample plane, which allowed for measuring not only the emission turn-on field at fixed locations but also the emission site density over large surface areas. The results show that dense forests of VACNFs are not good field emitters as they require high extracting (turn-on) fields. This is attributed to the screening of the local electric field by the neighboring VACNFs. In contrast, sparse forests of VACNFs exhibit moderate-to-low turn-on fields as well as high emission site and current densities, and long emission lifetime, which makes them very promising for various field emission applications.

  17. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  18. Improved parallel data partitioning by nested dissection with applications to information retrieval.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar

    The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less

  19. An Efficient Scheme for Updating Sparse Cholesky Factors

    NASA Technical Reports Server (NTRS)

    Raghavan, Padma

    2002-01-01

    Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.

  20. Kanerva's sparse distributed memory: An associative memory algorithm well-suited to the Connection Machine

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1988-01-01

    The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.

Top