high-dimensional sparse models: Topics by Science.gov

Sample records for high-dimensional sparse models

Sparse High Dimensional Models in Economics

PubMed Central

Fan, Jianqing; Lv, Jinchi; Qi, Lei

2010-01-01

This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635
Bi Sparsity Pursuit: A Paradigm for Robust Subspace Recovery

DTIC Science & Technology

2016-09-27

16. SECURITY CLASSIFICATION OF: The success of sparse models in computer vision and machine learning is due to the fact that, high dimensional data...Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Signal recovery, Sparse learning , Subspace modeling REPORT DOCUMENTATION PAGE 11...vision and machine learning is due to the fact that, high dimensional data is distributed in a union of low dimensional subspaces in many real-world
BI-sparsity pursuit for robust subspace recovery

DOE PAGES

Bian, Xiao; Krim, Hamid

2015-09-01

Here, the success of sparse models in computer vision and machine learning in many real-world applications, may be attributed in large part, to the fact that many high dimensional data are distributed in a union of low dimensional subspaces. The underlying structure may, however, be adversely affected by sparse errors, thus inducing additional complexity in recovering it. In this paper, we propose a bi-sparse model as a framework to investigate and analyze this problem, and provide as a result , a novel algorithm to recover the union of subspaces in presence of sparse corruptions. We additionally demonstrate the effectiveness ofmore » our method by experiments on real-world vision data.« less
Power Enhancement in High Dimensional Cross-Sectional Tests

PubMed Central

Fan, Jianqing; Liao, Yuan; Yao, Jiawei

2016-01-01

We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846
The Use of Sparse Direct Solver in Vector Finite Element Modeling for Calculating Two Dimensional (2-D) Magnetotelluric Responses in Transverse Electric (TE) Mode

NASA Astrophysics Data System (ADS)

Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.

2018-04-01

The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.

PubMed

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2011-01-01

The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.
Sparse representation of multi parametric DCE-MRI features using K-SVD for classifying gene expression based breast cancer recurrence risk

NASA Astrophysics Data System (ADS)

Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina

2014-03-01

We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS

PubMed Central

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2012-01-01

The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. PMID:22661790
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

PubMed Central

Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

2014-01-01

The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling.

PubMed

Wu, Hulin; Lu, Tao; Xue, Hongqi; Liang, Hua

2014-04-02

The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.
Network Data: Statistical Theory and New Models

DTIC Science & Technology

2016-02-17

SECURITY CLASSIFICATION OF: During this period of review, Bin Yu worked on many thrusts of high-dimensional statistical theory and methodologies. Her...research covered a wide range of topics in statistics including analysis and methods for spectral clustering for sparse and structured networks...2,7,8,21], sparse modeling (e.g. Lasso) [4,10,11,17,18,19], statistical guarantees for the EM algorithm [3], statistical analysis of algorithm leveraging
High-frame-rate full-vocal-tract 3D dynamic speech imaging.

PubMed

Fu, Maojing; Barlaz, Marissa S; Holtrop, Joseph L; Perry, Jamie L; Kuehn, David P; Shosted, Ryan K; Liang, Zhi-Pei; Sutton, Bradley P

2017-04-01

To achieve high temporal frame rate, high spatial resolution and full-vocal-tract coverage for three-dimensional dynamic speech MRI by using low-rank modeling and sparse sampling. Three-dimensional dynamic speech MRI is enabled by integrating a novel data acquisition strategy and an image reconstruction method with the partial separability model: (a) a self-navigated sparse sampling strategy that accelerates data acquisition by collecting high-nominal-frame-rate cone navigator sand imaging data within a single repetition time, and (b) are construction method that recovers high-quality speech dynamics from sparse (k,t)-space data by enforcing joint low-rank and spatiotemporal total variation constraints. The proposed method has been evaluated through in vivo experiments. A nominal temporal frame rate of 166 frames per second (defined based on a repetition time of 5.99 ms) was achieved for an imaging volume covering the entire vocal tract with a spatial resolution of 2.2 × 2.2 × 5.0 mm 3 . Practical utility of the proposed method was demonstrated via both validation experiments and a phonetics investigation. Three-dimensional dynamic speech imaging is possible with full-vocal-tract coverage, high spatial resolution and high nominal frame rate to provide dynamic speech data useful for phonetic studies. Magn Reson Med 77:1619-1629, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

PubMed Central

Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

2015-01-01

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Semi-implicit integration factor methods on sparse grids for high-dimensional systems

NASA Astrophysics Data System (ADS)

Wang, Dongyong; Chen, Weitao; Nie, Qing

2015-07-01

Numerical methods for partial differential equations in high-dimensional spaces are often limited by the curse of dimensionality. Though the sparse grid technique, based on a one-dimensional hierarchical basis through tensor products, is popular for handling challenges such as those associated with spatial discretization, the stability conditions on time step size due to temporal discretization, such as those associated with high-order derivatives in space and stiff reactions, remain. Here, we incorporate the sparse grids with the implicit integration factor method (IIF) that is advantageous in terms of stability conditions for systems containing stiff reactions and diffusions. We combine IIF, in which the reaction is treated implicitly and the diffusion is treated explicitly and exactly, with various sparse grid techniques based on the finite element and finite difference methods and a multi-level combination approach. The overall method is found to be efficient in terms of both storage and computational time for solving a wide range of PDEs in high dimensions. In particular, the IIF with the sparse grid combination technique is flexible and effective in solving systems that may include cross-derivatives and non-constant diffusion coefficients. Extensive numerical simulations in both linear and nonlinear systems in high dimensions, along with applications of diffusive logistic equations and Fokker-Planck equations, demonstrate the accuracy, efficiency, and robustness of the new methods, indicating potential broad applications of the sparse grid-based integration factor method.
Sparse PCA with Oracle Property.

PubMed

Gu, Quanquan; Wang, Zhaoran; Liu, Han

In this paper, we study the estimation of the k -dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank- k , and attains a [Formula: see text] statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets.
Sparse PCA with Oracle Property

PubMed Central

Gu, Quanquan; Wang, Zhaoran; Liu, Han

2014-01-01

In this paper, we study the estimation of the k-dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank-k, and attains a s/n statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets. PMID:25684971
Accelerated High-Dimensional MR Imaging with Sparse Sampling Using Low-Rank Tensors

PubMed Central

He, Jingfei; Liu, Qiegen; Christodoulou, Anthony G.; Ma, Chao; Lam, Fan

2017-01-01

High-dimensional MR imaging often requires long data acquisition time, thereby limiting its practical applications. This paper presents a low-rank tensor based method for accelerated high-dimensional MR imaging using sparse sampling. This method represents high-dimensional images as low-rank tensors (or partially separable functions) and uses this mathematical structure for sparse sampling of the data space and for image reconstruction from highly undersampled data. More specifically, the proposed method acquires two datasets with complementary sampling patterns, one for subspace estimation and the other for image reconstruction; image reconstruction from highly undersampled data is accomplished by fitting the measured data with a sparsity constraint on the core tensor and a group sparsity constraint on the spatial coefficients jointly using the alternating direction method of multipliers. The usefulness of the proposed method is demonstrated in MRI applications; it may also have applications beyond MRI. PMID:27093543
Sparse models for correlative and integrative analysis of imaging and genetic data

PubMed Central

Lin, Dongdong; Cao, Hongbao; Calhoun, Vince D.

2014-01-01

The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis. PMID:25218561
A Hyperspherical Adaptive Sparse-Grid Method for High-Dimensional Discontinuity Detection

DOE PAGES

Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max D.; ...

2015-06-24

This study proposes and analyzes a hyperspherical adaptive hierarchical sparse-grid method for detecting jump discontinuities of functions in high-dimensional spaces. The method is motivated by the theoretical and computational inefficiencies of well-known adaptive sparse-grid methods for discontinuity detection. Our novel approach constructs a function representation of the discontinuity hypersurface of an N-dimensional discontinuous quantity of interest, by virtue of a hyperspherical transformation. Then, a sparse-grid approximation of the transformed function is built in the hyperspherical coordinate system, whose value at each point is estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of the hypersurface, the newmore » technique can identify jump discontinuities with significantly reduced computational cost, compared to existing methods. In addition, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. Rigorous complexity analyses of the new method are provided as are several numerical examples that illustrate the effectiveness of the approach.« less
A hyper-spherical adaptive sparse-grid method for high-dimensional discontinuity detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max D.

This work proposes and analyzes a hyper-spherical adaptive hierarchical sparse-grid method for detecting jump discontinuities of functions in high-dimensional spaces is proposed. The method is motivated by the theoretical and computational inefficiencies of well-known adaptive sparse-grid methods for discontinuity detection. Our novel approach constructs a function representation of the discontinuity hyper-surface of an N-dimensional dis- continuous quantity of interest, by virtue of a hyper-spherical transformation. Then, a sparse-grid approximation of the transformed function is built in the hyper-spherical coordinate system, whose value at each point is estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of themore » hyper-surface, the new technique can identify jump discontinuities with significantly reduced computational cost, compared to existing methods. Moreover, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. Rigorous error estimates and complexity analyses of the new method are provided as are several numerical examples that illustrate the effectiveness of the approach.« less

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

PubMed Central

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2012-01-01

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. PMID:24348088
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.

PubMed

Fan, Jianqing; Liao, Yuan; Mincheva, Martina

2013-09-01

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
A new approach for solving seismic tomography problems and assessing the uncertainty through the use of graph theory and direct methods

NASA Astrophysics Data System (ADS)

Bogiatzis, P.; Ishii, M.; Davis, T. A.

2016-12-01

Seismic tomography inverse problems are among the largest high-dimensional parameter estimation tasks in Earth science. We show how combinatorics and graph theory can be used to analyze the structure of such problems, and to effectively decompose them into smaller ones that can be solved efficiently by means of the least squares method. In combination with recent high performance direct sparse algorithms, this reduction in dimensionality allows for an efficient computation of the model resolution and covariance matrices using limited resources. Furthermore, we show that a new sparse singular value decomposition method can be used to obtain the complete spectrum of the singular values. This procedure provides the means for more objective regularization and further dimensionality reduction of the problem. We apply this methodology to a moderate size, non-linear seismic tomography problem to image the structure of the crust and the upper mantle beneath Japan using local deep earthquakes recorded by the High Sensitivity Seismograph Network stations.
Mapping High Dimensional Sparse Customer Requirements into Product Configurations

NASA Astrophysics Data System (ADS)

Jiao, Yao; Yang, Yu; Zhang, Hongshan

2017-10-01

Mapping customer requirements into product configurations is a crucial step for product design, while, customers express their needs ambiguously and locally due to the lack of domain knowledge. Thus the data mining process of customer requirements might result in fragmental information with high dimensional sparsity, leading the mapping procedure risk uncertainty and complexity. The Expert Judgment is widely applied against that background since there is no formal requirements for systematic or structural data. However, there are concerns on the repeatability and bias for Expert Judgment. In this study, an integrated method by adjusted Local Linear Embedding (LLE) and Naïve Bayes (NB) classifier is proposed to map high dimensional sparse customer requirements to product configurations. The integrated method adjusts classical LLE to preprocess high dimensional sparse dataset to satisfy the prerequisite of NB for classifying different customer requirements to corresponding product configurations. Compared with Expert Judgment, the adjusted LLE with NB performs much better in a real-world Tablet PC design case both in accuracy and robustness.
Regularized Embedded Multiple Kernel Dimensionality Reduction for Mine Signal Processing.

PubMed

Li, Shuang; Liu, Bing; Zhang, Chen

2016-01-01

Traditional multiple kernel dimensionality reduction models are generally based on graph embedding and manifold assumption. But such assumption might be invalid for some high-dimensional or sparse data due to the curse of dimensionality, which has a negative influence on the performance of multiple kernel learning. In addition, some models might be ill-posed if the rank of matrices in their objective functions was not high enough. To address these issues, we extend the traditional graph embedding framework and propose a novel regularized embedded multiple kernel dimensionality reduction method. Different from the conventional convex relaxation technique, the proposed algorithm directly takes advantage of a binary search and an alternative optimization scheme to obtain optimal solutions efficiently. The experimental results demonstrate the effectiveness of the proposed method for supervised, unsupervised, and semisupervised scenarios.
Embedded sparse representation of fMRI data via group-wise dictionary optimization

NASA Astrophysics Data System (ADS)

Zhu, Dajiang; Lin, Binbin; Faskowitz, Joshua; Ye, Jieping; Thompson, Paul M.

2016-03-01

Sparse learning enables dimension reduction and efficient modeling of high dimensional signals and images, but it may need to be tailored to best suit specific applications and datasets. Here we used sparse learning to efficiently represent functional magnetic resonance imaging (fMRI) data from the human brain. We propose a novel embedded sparse representation (ESR), to identify the most consistent dictionary atoms across different brain datasets via an iterative group-wise dictionary optimization procedure. In this framework, we introduced additional criteria to make the learned dictionary atoms more consistent across different subjects. We successfully identified four common dictionary atoms that follow the external task stimuli with very high accuracy. After projecting the corresponding coefficient vectors back into the 3-D brain volume space, the spatial patterns are also consistent with traditional fMRI analysis results. Our framework reveals common features of brain activation in a population, as a new, efficient fMRI analysis method.
Hyperspherical Sparse Approximation Techniques for High-Dimensional Discontinuity Detection

DOE PAGES

Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max; ...

2016-08-04

This work proposes a hyperspherical sparse approximation framework for detecting jump discontinuities in functions in high-dimensional spaces. The need for a novel approach results from the theoretical and computational inefficiencies of well-known approaches, such as adaptive sparse grids, for discontinuity detection. Our approach constructs the hyperspherical coordinate representation of the discontinuity surface of a function. Then sparse approximations of the transformed function are built in the hyperspherical coordinate system, with values at each point estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of the hypersurface, the new technique can identify jump discontinuities with significantly reduced computationalmore » cost, compared to existing methods. Several approaches are used to approximate the transformed discontinuity surface in the hyperspherical system, including adaptive sparse grid and radial basis function interpolation, discrete least squares projection, and compressed sensing approximation. Moreover, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. In conclusion, rigorous complexity analyses of the new methods are provided, as are several numerical examples that illustrate the effectiveness of our approach.« less
Non-convex Statistical Optimization for Sparse Tensor Graphical Model

PubMed Central

Sun, Wei; Wang, Zhaoran; Liu, Han; Cheng, Guang

2016-01-01

We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. The penalized maximum likelihood estimation of this model involves minimizing a non-convex objective function. In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence as well as consistent graph recovery. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. Our theoretical results are backed by thorough numerical studies. PMID:28316459
Bayesian Analysis of High Dimensional Classification

NASA Astrophysics Data System (ADS)

Mukhopadhyay, Subhadeep; Liang, Faming

2009-12-01

Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
Yielding physically-interpretable emulators - A Sparse PCA approach

NASA Astrophysics Data System (ADS)

Galelli, S.; Alsahaf, A.; Giuliani, M.; Castelletti, A.

2015-12-01

Projection-based techniques, such as Principal Orthogonal Decomposition (POD), are a common approach to surrogate high-fidelity process-based models by lower order dynamic emulators. With POD, the dimensionality reduction is achieved by using observations, or 'snapshots' - generated with the high-fidelity model -, to project the entire set of input and state variables of this model onto a smaller set of basis functions that account for most of the variability in the data. While reduction efficiency and variance control of POD techniques are usually very high, the resulting emulators are structurally highly complex and can hardly be given a physically meaningful interpretation as each basis is a projection of the entire set of inputs and states. In this work, we propose a novel approach based on Sparse Principal Component Analysis (SPCA) that combines the several assets of POD methods with the potential for ex-post interpretation of the emulator structure. SPCA reduces the number of non-zero coefficients in the basis functions by identifying a sparse matrix of coefficients. While the resulting set of basis functions may retain less variance of the snapshots, the presence of a few non-zero coefficients assists in the interpretation of the underlying physical processes. The SPCA approach is tested on the reduction of a 1D hydro-ecological model (DYRESM-CAEDYM) used to describe the main ecological and hydrodynamic processes in Tono Dam, Japan. An experimental comparison against a standard POD approach shows that SPCA achieves the same accuracy in emulating a given output variable - for the same level of dimensionality reduction - while yielding better insights of the main process dynamics.
Deep ensemble learning of sparse regression models for brain disease diagnosis.

PubMed

Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

2017-04-01

Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep ensemble learning of sparse regression models for brain disease diagnosis

PubMed Central

Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

2018-01-01

Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394
Compressive Sensing with Cross-Validation and Stop-Sampling for Sparse Polynomial Chaos Expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huan, Xun; Safta, Cosmin; Sargsyan, Khachik

Compressive sensing is a powerful technique for recovering sparse solutions of underdetermined linear systems, which is often encountered in uncertainty quanti cation analysis of expensive and high-dimensional physical models. We perform numerical investigations employing several com- pressive sensing solvers that target the unconstrained LASSO formulation, with a focus on linear systems that arise in the construction of polynomial chaos expansions. With core solvers of l1 ls, SpaRSA, CGIST, FPC AS, and ADMM, we develop techniques to mitigate over tting through an automated selection of regularization constant based on cross-validation, and a heuristic strategy to guide the stop-sampling decision. Practical recommendationsmore » on parameter settings for these tech- niques are provided and discussed. The overall method is applied to a series of numerical examples of increasing complexity, including large eddy simulations of supersonic turbulent jet-in-cross flow involving a 24-dimensional input. Through empirical phase-transition diagrams and convergence plots, we illustrate sparse recovery performance under structures induced by polynomial chaos, accuracy and computational tradeoffs between polynomial bases of different degrees, and practi- cability of conducting compressive sensing for a realistic, high-dimensional physical application. Across test cases studied in this paper, we find ADMM to have demonstrated empirical advantages through consistent lower errors and faster computational times.« less
Competition in high dimensional spaces using a sparse approximation of neural fields.

PubMed

Quinton, Jean-Charles; Girau, Bernard; Lefort, Mathieu

2011-01-01

The Continuum Neural Field Theory implements competition within topologically organized neural networks with lateral inhibitory connections. However, due to the polynomial complexity of matrix-based implementations, updating dense representations of the activity becomes computationally intractable when an adaptive resolution or an arbitrary number of input dimensions is required. This paper proposes an alternative to self-organizing maps with a sparse implementation based on Gaussian mixture models, promoting a trade-off in redundancy for higher computational efficiency and alleviating constraints on the underlying substrate.This version reproduces the emergent attentional properties of the original equations, by directly applying them within a continuous approximation of a high dimensional neural field. The model is compatible with preprocessed sensory flows but can also be interfaced with artificial systems. This is particularly important for sensorimotor systems, where decisions and motor actions must be taken and updated in real-time. Preliminary tests are performed on a reactive color tracking application, using spatially distributed color features.
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.

PubMed

Kim, Hyunsoo; Park, Haesun

2007-06-15

Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
Two-dimensional sparse wavenumber recovery for guided wavefields

NASA Astrophysics Data System (ADS)

Sabeti, Soroosh; Harley, Joel B.

2018-04-01

The multi-modal and dispersive behavior of guided waves is often characterized by their dispersion curves, which describe their frequency-wavenumber behavior. In prior work, compressive sensing based techniques, such as sparse wavenumber analysis (SWA), have been capable of recovering dispersion curves from limited data samples. A major limitation of SWA, however, is the assumption that the structure is isotropic. As a result, SWA fails when applied to composites and other anisotropic structures. There have been efforts to address this issue in the literature, but they either are not easily generalizable or do not sufficiently express the data. In this paper, we enhance the existing approaches by employing a two-dimensional wavenumber model to account for direction-dependent velocities in anisotropic media. We integrate this model with tools from compressive sensing to reconstruct a wavefield from incomplete data. Specifically, we create a modified two-dimensional orthogonal matching pursuit algorithm that takes an undersampled wavefield image, with specified unknown elements, and determines its sparse wavenumber characteristics. We then recover the entire wavefield from the sparse representations obtained with our small number of data samples.
Data-assisted reduced-order modeling of extreme events in complex dynamical systems

PubMed Central

Koumoutsakos, Petros

2018-01-01

The prediction of extreme events, from avalanches and droughts to tsunamis and epidemics, depends on the formulation and analysis of relevant, complex dynamical systems. Such dynamical systems are characterized by high intrinsic dimensionality with extreme events having the form of rare transitions that are several standard deviations away from the mean. Such systems are not amenable to classical order-reduction methods through projection of the governing equations due to the large intrinsic dimensionality of the underlying attractor as well as the complexity of the transient events. Alternatively, data-driven techniques aim to quantify the dynamics of specific, critical modes by utilizing data-streams and by expanding the dimensionality of the reduced-order model using delayed coordinates. In turn, these methods have major limitations in regions of the phase space with sparse data, which is the case for extreme events. In this work, we develop a novel hybrid framework that complements an imperfect reduced order model, with data-streams that are integrated though a recurrent neural network (RNN) architecture. The reduced order model has the form of projected equations into a low-dimensional subspace that still contains important dynamical information about the system and it is expanded by a long short-term memory (LSTM) regularization. The LSTM-RNN is trained by analyzing the mismatch between the imperfect model and the data-streams, projected to the reduced-order space. The data-driven model assists the imperfect model in regions where data is available, while for locations where data is sparse the imperfect model still provides a baseline for the prediction of the system state. We assess the developed framework on two challenging prototype systems exhibiting extreme events. We show that the blended approach has improved performance compared with methods that use either data streams or the imperfect model alone. Notably the improvement is more significant in regions associated with extreme events, where data is sparse. PMID:29795631
Data-assisted reduced-order modeling of extreme events in complex dynamical systems.

PubMed

Wan, Zhong Yi; Vlachas, Pantelis; Koumoutsakos, Petros; Sapsis, Themistoklis

2018-01-01

The prediction of extreme events, from avalanches and droughts to tsunamis and epidemics, depends on the formulation and analysis of relevant, complex dynamical systems. Such dynamical systems are characterized by high intrinsic dimensionality with extreme events having the form of rare transitions that are several standard deviations away from the mean. Such systems are not amenable to classical order-reduction methods through projection of the governing equations due to the large intrinsic dimensionality of the underlying attractor as well as the complexity of the transient events. Alternatively, data-driven techniques aim to quantify the dynamics of specific, critical modes by utilizing data-streams and by expanding the dimensionality of the reduced-order model using delayed coordinates. In turn, these methods have major limitations in regions of the phase space with sparse data, which is the case for extreme events. In this work, we develop a novel hybrid framework that complements an imperfect reduced order model, with data-streams that are integrated though a recurrent neural network (RNN) architecture. The reduced order model has the form of projected equations into a low-dimensional subspace that still contains important dynamical information about the system and it is expanded by a long short-term memory (LSTM) regularization. The LSTM-RNN is trained by analyzing the mismatch between the imperfect model and the data-streams, projected to the reduced-order space. The data-driven model assists the imperfect model in regions where data is available, while for locations where data is sparse the imperfect model still provides a baseline for the prediction of the system state. We assess the developed framework on two challenging prototype systems exhibiting extreme events. We show that the blended approach has improved performance compared with methods that use either data streams or the imperfect model alone. Notably the improvement is more significant in regions associated with extreme events, where data is sparse.
EPS-LASSO: Test for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous Traits.

PubMed

Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen

2018-01-25

Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

NASA Astrophysics Data System (ADS)

Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

2011-12-01

Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.

Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals

DOE PAGES

Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.

2018-03-20

A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.

A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
Compressive hyperspectral and multispectral imaging fusion

NASA Astrophysics Data System (ADS)

Espitia, Óscar; Castillo, Sergio; Arguello, Henry

2016-05-01

Image fusion is a valuable framework which combines two or more images of the same scene from one or multiple sensors, allowing to improve the resolution of the images and increase the interpretable content. In remote sensing a common fusion problem consists of merging hyperspectral (HS) and multispectral (MS) images that involve large amount of redundant data, which ignores the highly correlated structure of the datacube along the spatial and spectral dimensions. Compressive HS and MS systems compress the spectral data in the acquisition step allowing to reduce the data redundancy by using different sampling patterns. This work presents a compressed HS and MS image fusion approach, which uses a high dimensional joint sparse model. The joint sparse model is formulated by combining HS and MS compressive acquisition models. The high spectral and spatial resolution image is reconstructed by using sparse optimization algorithms. Different fusion spectral image scenarios are used to explore the performance of the proposed scheme. Several simulations with synthetic and real datacubes show promising results as the reliable reconstruction of a high spectral and spatial resolution image can be achieved by using as few as just the 50% of the datacube.
Weather prediction using a genetic memory

NASA Technical Reports Server (NTRS)

Rogers, David

1990-01-01

Kanaerva's sparse distributed memory (SDM) is an associative memory model based on the mathematical properties of high dimensional binary address spaces. Holland's genetic algorithms are a search technique for high dimensional spaces inspired by evolutional processes of DNA. Genetic Memory is a hybrid of the above two systems, in which the memory uses a genetic algorithm to dynamically reconfigure its physical storage locations to reflect correlations between the stored addresses and data. This architecture is designed to maximize the ability of the system to scale-up to handle real world problems.
Locality-preserving sparse representation-based classification in hyperspectral imagery

NASA Astrophysics Data System (ADS)

Gao, Lianru; Yu, Haoyang; Zhang, Bing; Li, Qingting

2016-10-01

This paper proposes to combine locality-preserving projections (LPP) and sparse representation (SR) for hyperspectral image classification. The LPP is first used to reduce the dimensionality of all the training and testing data by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold, where the high-dimensional data lies. Then, SR codes the projected testing pixels as sparse linear combinations of all the training samples to classify the testing pixels by evaluating which class leads to the minimum approximation error. The integration of LPP and SR represents an innovative contribution to the literature. The proposed approach, called locality-preserving SR-based classification, addresses the imbalance between high dimensionality of hyperspectral data and the limited number of training samples. Experimental results on three real hyperspectral data sets demonstrate that the proposed approach outperforms the original counterpart, i.e., SR-based classification.
Multi-Scale Three-Dimensional Variational Data Assimilation System for Coastal Ocean Prediction

NASA Technical Reports Server (NTRS)

Li, Zhijin; Chao, Yi; Li, P. Peggy

2012-01-01

A multi-scale three-dimensional variational data assimilation system (MS-3DVAR) has been formulated and the associated software system has been developed for improving high-resolution coastal ocean prediction. This system helps improve coastal ocean prediction skill, and has been used in support of operational coastal ocean forecasting systems and field experiments. The system has been developed to improve the capability of data assimilation for assimilating, simultaneously and effectively, sparse vertical profiles and high-resolution remote sensing surface measurements into coastal ocean models, as well as constraining model biases. In this system, the cost function is decomposed into two separate units for the large- and small-scale components, respectively. As such, data assimilation is implemented sequentially from large to small scales, the background error covariance is constructed to be scale-dependent, and a scale-dependent dynamic balance is incorporated. This scheme then allows effective constraining large scales and model bias through assimilating sparse vertical profiles, and small scales through assimilating high-resolution surface measurements. This MS-3DVAR enhances the capability of the traditional 3DVAR for assimilating highly heterogeneously distributed observations, such as along-track satellite altimetry data, and particularly maximizing the extraction of information from limited numbers of vertical profile observations.
Sparse kernel methods for high-dimensional survival data.

PubMed

Evers, Ludger; Messow, Claudia-Martina

2008-07-15

Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Semi-blind sparse image reconstruction with application to MRFM.

PubMed

Park, Se Un; Dobigeon, Nicolas; Hero, Alfred O

2012-09-01

We propose a solution to the image deconvolution problem where the convolution kernel or point spread function (PSF) is assumed to be only partially known. Small perturbations generated from the model are exploited to produce a few principal components explaining the PSF uncertainty in a high-dimensional space. Unlike recent developments on blind deconvolution of natural images, we assume the image is sparse in the pixel basis, a natural sparsity arising in magnetic resonance force microscopy (MRFM). Our approach adopts a Bayesian Metropolis-within-Gibbs sampling framework. The performance of our Bayesian semi-blind algorithm for sparse images is superior to previously proposed semi-blind algorithms such as the alternating minimization algorithm and blind algorithms developed for natural images. We illustrate our myopic algorithm on real MRFM tobacco virus data.
Ontology Sparse Vector Learning Algorithm for Ontology Similarity Measuring and Ontology Mapping via ADAL Technology

NASA Astrophysics Data System (ADS)

Gao, Wei; Zhu, Linli; Wang, Kaiyun

2015-12-01

Ontology, a model of knowledge representation and storage, has had extensive applications in pharmaceutics, social science, chemistry and biology. In the age of “big data”, the constructed concepts are often represented as higher-dimensional data by scholars, and thus the sparse learning techniques are introduced into ontology algorithms. In this paper, based on the alternating direction augmented Lagrangian method, we present an ontology optimization algorithm for ontological sparse vector learning, and a fast version of such ontology technologies. The optimal sparse vector is obtained by an iterative procedure, and the ontology function is then obtained from the sparse vector. Four simulation experiments show that our ontological sparse vector learning model has a higher precision ratio on plant ontology, humanoid robotics ontology, biology ontology and physics education ontology data for similarity measuring and ontology mapping applications.
Blended particle filters for large-dimensional chaotic dynamical systems

PubMed Central

Majda, Andrew J.; Qi, Di; Sapsis, Themistoklis P.

2014-01-01

A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification. Stringent test cases for filtering involving the 40-dimensional Lorenz 96 model with a 5-dimensional adaptive subspace for nonlinear blended filtering in various turbulent regimes with at least nine positive Lyapunov exponents are used here. These cases demonstrate the high skill of the blended particle filter algorithms in capturing both highly non-Gaussian dynamical features as well as crucial nonlinear statistics for accurate filtering in extreme filtering regimes with sparse infrequent high-quality observations. The formalism developed here is also useful for multiscale filtering of turbulent systems and a simple application is sketched below. PMID:24825886
A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data.

PubMed

Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

2013-06-01

Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.
A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data

PubMed Central

Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

2014-01-01

Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720
Automatic segmentation of brain MRI in high-dimensional local and non-local feature space based on sparse representation.

PubMed

Khalilzadeh, Mohammad Mahdi; Fatemizadeh, Emad; Behnam, Hamid

2013-06-01

Automatic extraction of the varying regions of magnetic resonance images is required as a prior step in a diagnostic intelligent system. The sparsest representation and high-dimensional feature are provided based on learned dictionary. The classification is done by employing the technique that computes the reconstruction error locally and non-locally of each pixel. The acquired results from the real and simulated images are superior to the best MRI segmentation method with regard to the stability advantages. In addition, it is segmented exactly through a formula taken from the distance and sparse factors. Also, it is done automatically taking sparse factor in unsupervised clustering methods whose results have been improved. Copyright © 2013 Elsevier Inc. All rights reserved.
Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging.

PubMed

Rosa, Maria J; Mehta, Mitul A; Pich, Emilio M; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A T S; Williams, Steve C R; Dazzan, Paola; Doyle, Orla M; Marquand, Andre F

2015-01-01

An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.
Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis

NASA Astrophysics Data System (ADS)

Tang, Kunkun; Congedo, Pietro; Abgrall, Remi

2014-11-01

Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.
Joint sparse learning for 3-D facial expression generation.

PubMed

Song, Mingli; Tao, Dacheng; Sun, Shengpeng; Chen, Chun; Bu, Jiajun

2013-08-01

3-D facial expression generation, including synthesis and retargeting, has received intensive attentions in recent years, because it is important to produce realistic 3-D faces with specific expressions in modern film production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional representations. Based on JSL, we can effectively and efficiently generate various expressions of a 3-D face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of 3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones in terms of quality, time cost, and robustness.
A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.

PubMed

Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel

2017-01-01

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
The theoretical relationship between foliage temperature and canopy resistance in sparse crops

NASA Technical Reports Server (NTRS)

Shuttleworth, W. James; Gurney, Robert J.

1990-01-01

One-dimensional, sparse-crop interaction theory is reformulated to allow calculation of the canopy resistance from measurements of foliage temperature. A submodel is introduced to describe eddy diffusion within the canopy which provides a simple, empirical simulation of the reported behavior obtained from a second-order closure model. The sensitivity of the calculated canopy resistance to the parameters and formulas assumed in the model is investigated. The calculation is shown to exhibit a significant but acceptable sensitivity to extreme changes in canopy aerodynamics, and to changes in the surface resistance of the substrate beneath the canopy at high and intermediate values of leaf area index. In very sparse crops changes in the surface resistance of the substrate are shown to contaminate the calculated canopy resistance, tending to amplify the apparent response to changes in water availability. The theory is developed to allow the use of a measurement of substrate temperature as an option to mitigate this contamination.
Incorporating biological information in sparse principal component analysis with application to genomic data.

PubMed

Li, Ziyi; Safo, Sandra E; Long, Qi

2017-07-11

Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.
InSAR analysis of surface deformation over permafrost to estimate active layer thickness based on one-dimensional heat transfer model of soils

PubMed Central

Li, Zhiwei; Zhao, Rong; Hu, Jun; Wen, Lianxing; Feng, Guangcai; Zhang, Zeyu; Wang, Qijie

2015-01-01

This paper presents a novel method to estimate active layer thickness (ALT) over permafrost based on InSAR (Interferometric Synthetic Aperture Radar) observation and the heat transfer model of soils. The time lags between the periodic feature of InSAR-observed surface deformation over permafrost and the meteorologically recorded temperatures are assumed to be the time intervals that the temperature maximum to diffuse from the ground surface downward to the bottom of the active layer. By exploiting the time lags and the one-dimensional heat transfer model of soils, we estimate the ALTs. Using the frozen soil region in southern Qinghai-Tibet Plateau (QTP) as examples, we provided a conceptual demonstration of the estimation of the InSAR pixel-wise ALTs. In the case study, the ALTs are ranging from 1.02 to 3.14 m and with an average of 1.95 m. The results are compatible with those sparse ALT observations/estimations by traditional methods, while with extraordinary high spatial resolution at pixel level (~40 meter). The presented method is simple, and can potentially be used for deriving high-resolution ALTs in other remote areas similar to QTP, where only sparse observations are available now. PMID:26480892

InSAR analysis of surface deformation over permafrost to estimate active layer thickness based on one-dimensional heat transfer model of soils.

PubMed

Li, Zhiwei; Zhao, Rong; Hu, Jun; Wen, Lianxing; Feng, Guangcai; Zhang, Zeyu; Wang, Qijie

2015-10-20

This paper presents a novel method to estimate active layer thickness (ALT) over permafrost based on InSAR (Interferometric Synthetic Aperture Radar) observation and the heat transfer model of soils. The time lags between the periodic feature of InSAR-observed surface deformation over permafrost and the meteorologically recorded temperatures are assumed to be the time intervals that the temperature maximum to diffuse from the ground surface downward to the bottom of the active layer. By exploiting the time lags and the one-dimensional heat transfer model of soils, we estimate the ALTs. Using the frozen soil region in southern Qinghai-Tibet Plateau (QTP) as examples, we provided a conceptual demonstration of the estimation of the InSAR pixel-wise ALTs. In the case study, the ALTs are ranging from 1.02 to 3.14 m and with an average of 1.95 m. The results are compatible with those sparse ALT observations/estimations by traditional methods, while with extraordinary high spatial resolution at pixel level (~40 meter). The presented method is simple, and can potentially be used for deriving high-resolution ALTs in other remote areas similar to QTP, where only sparse observations are available now.
Removal of nuisance signals from limited and sparse 1H MRSI data using a union-of-subspaces model.

PubMed

Ma, Chao; Lam, Fan; Johnson, Curtis L; Liang, Zhi-Pei

2016-02-01

To remove nuisance signals (e.g., water and lipid signals) for (1) H MRSI data collected from the brain with limited and/or sparse (k, t)-space coverage. A union-of-subspace model is proposed for removing nuisance signals. The model exploits the partial separability of both the nuisance signals and the metabolite signal, and decomposes an MRSI dataset into several sets of generalized voxels that share the same spectral distributions. This model enables the estimation of the nuisance signals from an MRSI dataset that has limited and/or sparse (k, t)-space coverage. The proposed method has been evaluated using in vivo MRSI data. For conventional chemical shift imaging data with limited k-space coverage, the proposed method produced "lipid-free" spectra without lipid suppression during data acquisition at 130 ms echo time. For sparse (k, t)-space data acquired with conventional pulses for water and lipid suppression, the proposed method was also able to remove the remaining water and lipid signals with negligible residuals. Nuisance signals in (1) H MRSI data reside in low-dimensional subspaces. This property can be utilized for estimation and removal of nuisance signals from (1) H MRSI data even when they have limited and/or sparse coverage of (k, t)-space. The proposed method should prove useful especially for accelerated high-resolution (1) H MRSI of the brain. © 2015 Wiley Periodicals, Inc.
Application of a sparseness constraint in multivariate curve resolution - Alternating least squares.

PubMed

Hugelier, Siewert; Piqueras, Sara; Bedia, Carmen; de Juan, Anna; Ruckebusch, Cyril

2018-02-13

The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is, above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a constraint in multivariate curve resolution - alternating least squares (MCR-ALS), which aims at reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the case in some instrumental responses, such as mass spectra, or in concentration profiles linked to scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression framework with L 0 -norm penalty is applied. This L 0 -norm penalty constrains the number of non-null coefficients in the fit of the array constrained without having an a priori on the number and their positions. It has been shown that the sparseness constraint induces the suppression of values linked to uninformative channels and noise in MS spectra and improves the location of scattered compounds in distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart matrix of the MCR bilinear model. Copyright © 2017 Elsevier B.V. All rights reserved.
Social Collaborative Filtering by Trust.

PubMed

Yang, Bo; Lei, Yu; Liu, Jiming; Li, Wenjie

2017-08-01

Recommender systems are used to accurately and actively provide users with potentially interesting information or services. Collaborative filtering is a widely adopted approach to recommendation, but sparse data and cold-start users are often barriers to providing high quality recommendations. To address such issues, we propose a novel method that works to improve the performance of collaborative filtering recommendations by integrating sparse rating data given by users and sparse social trust network among these same users. This is a model-based method that adopts matrix factorization technique that maps users into low-dimensional latent feature spaces in terms of their trust relationship, and aims to more accurately reflect the users reciprocal influence on the formation of their own opinions and to learn better preferential patterns of users for high-quality recommendations. We use four large-scale datasets to show that the proposed method performs much better, especially for cold start users, than state-of-the-art recommendation algorithms for social collaborative filtering based on trust.
Ensemble of sparse classifiers for high-dimensional biological data.

PubMed

Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao

2015-01-01

Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.
Face recognition based on two-dimensional discriminant sparse preserving projection

NASA Astrophysics Data System (ADS)

Zhang, Dawei; Zhu, Shanan

2018-04-01

In this paper, a supervised dimensionality reduction algorithm named two-dimensional discriminant sparse preserving projection (2DDSPP) is proposed for face recognition. In order to accurately model manifold structure of data, 2DDSPP constructs within-class affinity graph and between-class affinity graph by the constrained least squares (LS) and l1 norm minimization problem, respectively. Based on directly operating on image matrix, 2DDSPP integrates graph embedding (GE) with Fisher criterion. The obtained projection subspace preserves within-class neighborhood geometry structure of samples, while keeping away samples from different classes. The experimental results on the PIE and AR face databases show that 2DDSPP can achieve better recognition performance.
Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

PubMed Central

Lin, Wei; Feng, Rui; Li, Hongzhe

2014-01-01

In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
Tree-Structured Infinite Sparse Factor Model

PubMed Central

Zhang, XianXing; Dunson, David B.; Carin, Lawrence

2013-01-01

A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also applied to compressive sensing and interpolation of incomplete images. PMID:25279389
Compressive sensing for single-shot two-dimensional coherent spectroscopy

NASA Astrophysics Data System (ADS)

Harel, E.; Spencer, A.; Spokoyny, B.

2017-02-01

In this work, we explore the use of compressive sensing for the rapid acquisition of two-dimensional optical spectra that encodes the electronic structure and ultrafast dynamics of condensed-phase molecular species. Specifically, we have developed a means to combine multiplexed single-element detection and single-shot and phase-resolved two-dimensional coherent spectroscopy. The method described, which we call Single Point Array Reconstruction by Spatial Encoding (SPARSE) eliminates the need for costly array detectors while speeding up acquisition by several orders of magnitude compared to scanning methods. Physical implementation of SPARSE is facilitated by combining spatiotemporal encoding of the nonlinear optical response and signal modulation by a high-speed digital micromirror device. We demonstrate the approach by investigating a well-characterized cyanine molecule and a photosynthetic pigment-protein complex. Hadamard and compressive sensing algorithms are demonstrated, with the latter achieving compression factors as high as ten. Both show good agreement with directly detected spectra. We envision a myriad of applications in nonlinear spectroscopy using SPARSE with broadband femtosecond light sources in so-far unexplored regions of the electromagnetic spectrum.
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

PubMed Central

Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

2010-01-01

We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139
GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models

PubMed Central

Mukherjee, Chiranjit; Rodriguez, Abel

2016-01-01

Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful. PMID:28626348
GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models.

PubMed

Mukherjee, Chiranjit; Rodriguez, Abel

2016-01-01

Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful.
Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation

PubMed Central

Zhao, Tuo; Liu, Han

2016-01-01

We propose an accelerated path-following iterative shrinkage thresholding algorithm (APISTA) for solving high dimensional sparse nonconvex learning problems. The main difference between APISTA and the path-following iterative shrinkage thresholding algorithm (PISTA) is that APISTA exploits an additional coordinate descent subroutine to boost the computational performance. Such a modification, though simple, has profound impact: APISTA not only enjoys the same theoretical guarantee as that of PISTA, i.e., APISTA attains a linear rate of convergence to a unique sparse local optimum with good statistical properties, but also significantly outperforms PISTA in empirical benchmarks. As an application, we apply APISTA to solve a family of nonconvex optimization problems motivated by estimating sparse semiparametric graphical models. APISTA allows us to obtain new statistical recovery results which do not exist in the existing literature. Thorough numerical results are provided to back up our theory. PMID:28133430
High-Dimensional Function Approximation With Neural Networks for Large Volumes of Data.

PubMed

Andras, Peter

2018-02-01

Approximation of high-dimensional functions is a challenge for neural networks due to the curse of dimensionality. Often the data for which the approximated function is defined resides on a low-dimensional manifold and in principle the approximation of the function over this manifold should improve the approximation performance. It has been show that projecting the data manifold into a lower dimensional space, followed by the neural network approximation of the function over this space, provides a more precise approximation of the function than the approximation of the function with neural networks in the original data space. However, if the data volume is very large, the projection into the low-dimensional space has to be based on a limited sample of the data. Here, we investigate the nature of the approximation error of neural networks trained over the projection space. We show that such neural networks should have better approximation performance than neural networks trained on high-dimensional data even if the projection is based on a relatively sparse sample of the data manifold. We also find that it is preferable to use a uniformly distributed sparse sample of the data for the purpose of the generation of the low-dimensional projection. We illustrate these results considering the practical neural network approximation of a set of functions defined on high-dimensional data including real world data as well.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
A General Sparse Tensor Framework for Electronic Structure Theory

DOE PAGES

Manzer, Samuel; Epifanovsky, Evgeny; Krylov, Anna I.; ...

2017-01-24

Linear-scaling algorithms must be developed in order to extend the domain of applicability of electronic structure theory to molecules of any desired size. But, the increasing complexity of modern linear-scaling methods makes code development and maintenance a significant challenge. A major contributor to this difficulty is the lack of robust software abstractions for handling block-sparse tensor operations. We therefore report the development of a highly efficient symbolic block-sparse tensor library in order to provide access to high-level software constructs to treat such problems. Our implementation supports arbitrary multi-dimensional sparsity in all input and output tensors. We then avoid cumbersome machine-generatedmore » code by implementing all functionality as a high-level symbolic C++ language library and demonstrate that our implementation attains very high performance for linear-scaling sparse tensor contractions.« less
Prediction of siRNA potency using sparse logistic regression.

PubMed

Hu, Wei; Hu, John

2014-06-01

RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
A sparse grid based method for generative dimensionality reduction of high-dimensional data

NASA Astrophysics Data System (ADS)

Bohn, Bastian; Garcke, Jochen; Griebel, Michael

2016-03-01

Generative dimensionality reduction methods play an important role in machine learning applications because they construct an explicit mapping from a low-dimensional space to the high-dimensional data space. We discuss a general framework to describe generative dimensionality reduction methods, where the main focus lies on a regularized principal manifold learning variant. Since most generative dimensionality reduction algorithms exploit the representer theorem for reproducing kernel Hilbert spaces, their computational costs grow at least quadratically in the number n of data. Instead, we introduce a grid-based discretization approach which automatically scales just linearly in n. To circumvent the curse of dimensionality of full tensor product grids, we use the concept of sparse grids. Furthermore, in real-world applications, some embedding directions are usually more important than others and it is reasonable to refine the underlying discretization space only in these directions. To this end, we employ a dimension-adaptive algorithm which is based on the ANOVA (analysis of variance) decomposition of a function. In particular, the reconstruction error is used to measure the quality of an embedding. As an application, the study of large simulation data from an engineering application in the automotive industry (car crash simulation) is performed.
Sparse graph regularization for robust crop mapping using hyperspectral remotely sensed imagery with very few in situ data

NASA Astrophysics Data System (ADS)

Xue, Zhaohui; Du, Peijun; Li, Jun; Su, Hongjun

2017-02-01

The generally limited availability of training data relative to the usually high data dimension pose a great challenge to accurate classification of hyperspectral imagery, especially for identifying crops characterized with highly correlated spectra. However, traditional parametric classification models are problematic due to the need of non-singular class-specific covariance matrices. In this research, a novel sparse graph regularization (SGR) method is presented, aiming at robust crop mapping using hyperspectral imagery with very few in situ data. The core of SGR lies in propagating labels from known data to unknown, which is triggered by: (1) the fraction matrix generated for the large unknown data by using an effective sparse representation algorithm with respect to the few training data serving as the dictionary; (2) the prediction function estimated for the few training data by formulating a regularization model based on sparse graph. Then, the labels of large unknown data can be obtained by maximizing the posterior probability distribution based on the two ingredients. SGR is more discriminative, data-adaptive, robust to noise, and efficient, which is unique with regard to previously proposed approaches and has high potentials in discriminating crops, especially when facing insufficient training data and high-dimensional spectral space. The study area is located at Zhangye basin in the middle reaches of Heihe watershed, Gansu, China, where eight crop types were mapped with Compact Airborne Spectrographic Imager (CASI) and Shortwave Infrared Airborne Spectrogrpahic Imager (SASI) hyperspectral data. Experimental results demonstrate that the proposed method significantly outperforms other traditional and state-of-the-art methods.
Sparse learning of stochastic dynamical equations

NASA Astrophysics Data System (ADS)

Boninsegna, Lorenzo; Nüske, Feliks; Clementi, Cecilia

2018-06-01

With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.

High-dimensional statistical inference: From vector to matrix

NASA Astrophysics Data System (ADS)

Zhang, Anru

Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
Precipitation Data Merging over Mountainous Areas Using Satellite Estimates and Sparse Gauge Observations (PDMMA-USESGO) for Hydrological Modeling — A Case Study over the Tibetan Plateau

NASA Astrophysics Data System (ADS)

Yang, Z.; Hsu, K. L.; Sorooshian, S.; Xu, X.

2017-12-01

Precipitation in mountain regions generally occurs with high-frequency-intensity, whereas it is not well-captured by sparsely distributed rain-gauges imposing a great challenge on water management. Satellite-based Precipitation Estimation (SPE) provides global high-resolution alternative data for hydro-climatic studies, but are subject to considerable biases. In this study, a model named PDMMA-USESGO for Precipitation Data Merging over Mountainous Areas Using Satellite Estimates and Sparse Gauge Observations is developed to support precipitation mapping and hydrological modeling in mountainous catchments. The PDMMA-USESGO framework includes two calculating steps—adjusting SPE biases and merging satellite-gauge estimates—using the quantile mapping approach, a two-dimensional Gaussian weighting scheme (considering elevation effect), and an inverse root mean square error weighting method. The model is applied and evaluated over the Tibetan Plateau (TP) with the PERSIANN-CCS precipitation retrievals (daily, 0.04°×0.04°) and sparse observations from 89 gauges, for the 11-yr period of 2003-2013. To assess the data merging effects on streamflow modeling, a hydrological evaluation is conducted over a watershed in southeast TP based on the Soil and Water Assessment Tool (SWAT). Evaluation results indicate effectiveness of the model in generating high-resolution-accuracy precipitation estimates over mountainous terrain, with the merged estimates (Mer-SG) presenting consistently improved correlation coefficients, root mean square errors and absolute mean biases from original satellite estimates (Ori-CCS). It is found the Mer-SG forced streamflow simulations exhibit great improvements from those simulations using Ori-CCS, with coefficient of determination (R2) and Nash-Sutcliffe efficiency reach to 0.8 and 0.65, respectively. The presented model and case study serve as valuable references for the hydro-climatic applications using remote sensing-gauge information in other mountain areas of the world.
Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluid simulation

NASA Astrophysics Data System (ADS)

Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi

2016-06-01

The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.
Polynomial meta-models with canonical low-rank approximations: Numerical insights and comparison to sparse polynomial chaos expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Konakli, Katerina, E-mail: konakli@ibk.baug.ethz.ch; Sudret, Bruno

2016-09-15

The growing need for uncertainty analysis of complex computational models has led to an expanding use of meta-models across engineering and sciences. The efficiency of meta-modeling techniques relies on their ability to provide statistically-equivalent analytical representations based on relatively few evaluations of the original model. Polynomial chaos expansions (PCE) have proven a powerful tool for developing meta-models in a wide range of applications; the key idea thereof is to expand the model response onto a basis made of multivariate polynomials obtained as tensor products of appropriate univariate polynomials. The classical PCE approach nevertheless faces the “curse of dimensionality”, namely themore » exponential increase of the basis size with increasing input dimension. To address this limitation, the sparse PCE technique has been proposed, in which the expansion is carried out on only a few relevant basis terms that are automatically selected by a suitable algorithm. An alternative for developing meta-models with polynomial functions in high-dimensional problems is offered by the newly emerged low-rank approximations (LRA) approach. By exploiting the tensor–product structure of the multivariate basis, LRA can provide polynomial representations in highly compressed formats. Through extensive numerical investigations, we herein first shed light on issues relating to the construction of canonical LRA with a particular greedy algorithm involving a sequential updating of the polynomial coefficients along separate dimensions. Specifically, we examine the selection of optimal rank, stopping criteria in the updating of the polynomial coefficients and error estimation. In the sequel, we confront canonical LRA to sparse PCE in structural-mechanics and heat-conduction applications based on finite-element solutions. Canonical LRA exhibit smaller errors than sparse PCE in cases when the number of available model evaluations is small with respect to the input dimension, a situation that is often encountered in real-life problems. By introducing the conditional generalization error, we further demonstrate that canonical LRA tend to outperform sparse PCE in the prediction of extreme model responses, which is critical in reliability analysis.« less
JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.

PubMed

Ji, Jiadong; He, Di; Feng, Yang; He, Yong; Xue, Fuzhong; Xie, Lei

2017-10-01

A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. R scripts available at https://github.com/jijiadong/JDINAC. lxie@iscb.org. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*

PubMed Central

Cai, T. Tony; Zhang, Anru

2016-01-01

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.

PubMed

Cai, T Tony; Zhang, Anru

2016-09-01

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.
Dimension-Factorized Range Migration Algorithm for Regularly Distributed Array Imaging

PubMed Central

Guo, Qijia; Wang, Jie; Chang, Tianying

2017-01-01

The two-dimensional planar MIMO array is a popular approach for millimeter wave imaging applications. As a promising practical alternative, sparse MIMO arrays have been devised to reduce the number of antenna elements and transmitting/receiving channels with predictable and acceptable loss in image quality. In this paper, a high precision three-dimensional imaging algorithm is proposed for MIMO arrays of the regularly distributed type, especially the sparse varieties. Termed the Dimension-Factorized Range Migration Algorithm, the new imaging approach factorizes the conventional MIMO Range Migration Algorithm into multiple operations across the sparse dimensions. The thinner the sparse dimensions of the array, the more efficient the new algorithm will be. Advantages of the proposed approach are demonstrated by comparison with the conventional MIMO Range Migration Algorithm and its non-uniform fast Fourier transform based variant in terms of all the important characteristics of the approaches, especially the anti-noise capability. The computation cost is analyzed as well to evaluate the efficiency quantitatively. PMID:29113083
TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES

PubMed Central

Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn

2017-01-01

Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene “relationship” matrices that are of practical interest, such as the weighted adjacency matrices. PMID:29081874
TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.

PubMed

Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn

2017-09-01

Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene "relationship" matrices that are of practical interest, such as the weighted adjacency matrices.
Sparse modeling of spatial environmental variables associated with asthma

PubMed Central

Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.

2014-01-01

Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437
Sparse modeling of spatial environmental variables associated with asthma.

PubMed

Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W

2015-02-01

Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.
Inference for High-dimensional Differential Correlation Matrices.

PubMed

Cai, T Tony; Zhang, Anru

2016-01-01

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

PubMed

Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

2011-05-09

Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

PubMed Central

2011-01-01

Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

NASA Astrophysics Data System (ADS)

Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton

2013-10-01

Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.
Image statistics underlying natural texture selectivity of neurons in macaque V4

PubMed Central

Okazawa, Gouki; Tajima, Satohiro; Komatsu, Hidehiko

2015-01-01

Our daily visual experiences are inevitably linked to recognizing the rich variety of textures. However, how the brain encodes and differentiates a plethora of natural textures remains poorly understood. Here, we show that many neurons in macaque V4 selectively encode sparse combinations of higher-order image statistics to represent natural textures. We systematically explored neural selectivity in a high-dimensional texture space by combining texture synthesis and efficient-sampling techniques. This yielded parameterized models for individual texture-selective neurons. The models provided parsimonious but powerful predictors for each neuron’s preferred textures using a sparse combination of image statistics. As a whole population, the neuronal tuning was distributed in a way suitable for categorizing textures and quantitatively predicts human ability to discriminate textures. Together, we suggest that the collective representation of visual image statistics in V4 plays a key role in organizing the natural texture perception. PMID:25535362
The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images.

PubMed

Loxley, P N

2017-10-01

The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.
Structural zeros in high-dimensional data with applications to microbiome studies.

PubMed

Kaul, Abhishek; Davidov, Ori; Peddada, Shyamal D

2017-07-01

This paper is motivated by the recent interest in the analysis of high-dimensional microbiome data. A key feature of these data is the presence of "structural zeros" which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are unable to model these structural zeros. We define a general framework which allows for structural zeros in the model and propose methods of estimating sparse high-dimensional covariance and precision matrices under this setup. We establish error bounds in the spectral and Frobenius norms for the proposed estimators and empirically verify them with a simulation study. The proposed methodology is illustrated by applying it to the global gut microbiome data of Yatsunenko and others (2012. Human gut microbiome viewed across age and geography. Nature 486, 222-227). Using our methodology we classify subjects according to the geographical location on the basis of their gut microbiome. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Saliency Detection for Stereoscopic 3D Images in the Quaternion Frequency Domain

NASA Astrophysics Data System (ADS)

Cai, Xingyu; Zhou, Wujie; Cen, Gang; Qiu, Weiwei

2018-06-01

Recent studies have shown that a remarkable distinction exists between human binocular and monocular viewing behaviors. Compared with two-dimensional (2D) saliency detection models, stereoscopic three-dimensional (S3D) image saliency detection is a more challenging task. In this paper, we propose a saliency detection model for S3D images. The final saliency map of this model is constructed from the local quaternion Fourier transform (QFT) sparse feature and global QFT log-Gabor feature. More specifically, the local QFT feature measures the saliency map of an S3D image by analyzing the location of a similar patch. The similar patch is chosen using a sparse representation method. The global saliency map is generated by applying the wake edge-enhanced gradient QFT map through a band-pass filter. The results of experiments on two public datasets show that the proposed model outperforms existing computational saliency models for estimating S3D image saliency.

Sparse Recovery via Differential Inclusions

DTIC Science & Technology

2014-07-01

2242. [Wai09] Martin J. Wainwright, Sharp thresholds for high-dimensional and noisy spar- sity recovery using l1 -constrained quadratic programming...solution, (1.11) βt = { 0, if t < 1/y; y(1− e−κ(t−1/y)), otherwise, which converges to the unbiased Bregman ISS estimator exponentially fast. Let us ...are not given the support set S, so the following two prop- erties are used to evaluate the performance of an estimator β̂. 1. Model selection
Interferometric synthetic aperture radar phase unwrapping based on sparse Markov random fields by graph cuts

NASA Astrophysics Data System (ADS)

Zhou, Lifan; Chai, Dengfeng; Xia, Yu; Ma, Peifeng; Lin, Hui

2018-01-01

Phase unwrapping (PU) is one of the key processes in reconstructing the digital elevation model of a scene from its interferometric synthetic aperture radar (InSAR) data. It is known that two-dimensional (2-D) PU problems can be formulated as maximum a posteriori estimation of Markov random fields (MRFs). However, considering that the traditional MRF algorithm is usually defined on a rectangular grid, it fails easily if large parts of the wrapped data are dominated by noise caused by large low-coherence area or rapid-topography variation. A PU solution based on sparse MRF is presented to extend the traditional MRF algorithm to deal with sparse data, which allows the unwrapping of InSAR data dominated by high phase noise. To speed up the graph cuts algorithm for sparse MRF, we designed dual elementary graphs and merged them to obtain the Delaunay triangle graph, which is used to minimize the energy function efficiently. The experiments on simulated and real data, compared with other existing algorithms, both confirm the effectiveness of the proposed MRF approach, which suffers less from decorrelation effects caused by large low-coherence area or rapid-topography variation.
Decomposition and model selection for large contingency tables.

PubMed

Dahinden, Corinne; Kalisch, Markus; Bühlmann, Peter

2010-04-01

Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
A weighted ℓ{sub 1}-minimization approach for sparse polynomial chaos expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peng, Ji; Hampton, Jerrad; Doostan, Alireza, E-mail: alireza.doostan@colorado.edu

2014-06-15

This work proposes a method for sparse polynomial chaos (PC) approximation of high-dimensional stochastic functions based on non-adapted random sampling. We modify the standard ℓ{sub 1}-minimization algorithm, originally proposed in the context of compressive sampling, using a priori information about the decay of the PC coefficients, when available, and refer to the resulting algorithm as weightedℓ{sub 1}-minimization. We provide conditions under which we may guarantee recovery using this weighted scheme. Numerical tests are used to compare the weighted and non-weighted methods for the recovery of solutions to two differential equations with high-dimensional random inputs: a boundary value problem with amore » random elliptic operator and a 2-D thermally driven cavity flow with random boundary condition.« less
Inference for High-dimensional Differential Correlation Matrices *

PubMed Central

Cai, T. Tony; Zhang, Anru

2015-01-01

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380
Nonlinear spike-and-slab sparse coding for interpretable image encoding.

PubMed

Shelton, Jacquelyn A; Sheikh, Abdul-Saboor; Bornschein, Jörg; Sterne, Philip; Lücke, Jörg

2015-01-01

Sparse coding is a popular approach to model natural images but has faced two main challenges: modelling low-level image components (such as edge-like structures and their occlusions) and modelling varying pixel intensities. Traditionally, images are modelled as a sparse linear superposition of dictionary elements, where the probabilistic view of this problem is that the coefficients follow a Laplace or Cauchy prior distribution. We propose a novel model that instead uses a spike-and-slab prior and nonlinear combination of components. With the prior, our model can easily represent exact zeros for e.g. the absence of an image component, such as an edge, and a distribution over non-zero pixel intensities. With the nonlinearity (the nonlinear max combination rule), the idea is to target occlusions; dictionary elements correspond to image components that can occlude each other. There are major consequences of the model assumptions made by both (non)linear approaches, thus the main goal of this paper is to isolate and highlight differences between them. Parameter optimization is analytically and computationally intractable in our model, thus as a main contribution we design an exact Gibbs sampler for efficient inference which we can apply to higher dimensional data using latent variable preselection. Results on natural and artificial occlusion-rich data with controlled forms of sparse structure show that our model can extract a sparse set of edge-like components that closely match the generating process, which we refer to as interpretable components. Furthermore, the sparseness of the solution closely follows the ground-truth number of components/edges in the images. The linear model did not learn such edge-like components with any level of sparsity. This suggests that our model can adaptively well-approximate and characterize the meaningful generation process.
Nonlinear Spike-And-Slab Sparse Coding for Interpretable Image Encoding

PubMed Central

Shelton, Jacquelyn A.; Sheikh, Abdul-Saboor; Bornschein, Jörg; Sterne, Philip; Lücke, Jörg

2015-01-01

Sparse coding is a popular approach to model natural images but has faced two main challenges: modelling low-level image components (such as edge-like structures and their occlusions) and modelling varying pixel intensities. Traditionally, images are modelled as a sparse linear superposition of dictionary elements, where the probabilistic view of this problem is that the coefficients follow a Laplace or Cauchy prior distribution. We propose a novel model that instead uses a spike-and-slab prior and nonlinear combination of components. With the prior, our model can easily represent exact zeros for e.g. the absence of an image component, such as an edge, and a distribution over non-zero pixel intensities. With the nonlinearity (the nonlinear max combination rule), the idea is to target occlusions; dictionary elements correspond to image components that can occlude each other. There are major consequences of the model assumptions made by both (non)linear approaches, thus the main goal of this paper is to isolate and highlight differences between them. Parameter optimization is analytically and computationally intractable in our model, thus as a main contribution we design an exact Gibbs sampler for efficient inference which we can apply to higher dimensional data using latent variable preselection. Results on natural and artificial occlusion-rich data with controlled forms of sparse structure show that our model can extract a sparse set of edge-like components that closely match the generating process, which we refer to as interpretable components. Furthermore, the sparseness of the solution closely follows the ground-truth number of components/edges in the images. The linear model did not learn such edge-like components with any level of sparsity. This suggests that our model can adaptively well-approximate and characterize the meaningful generation process. PMID:25954947
Harnessing Sparse and Low-Dimensional Structures for Robust Clustering of Imagery Data

ERIC Educational Resources Information Center

Rao, Shankar Ramamohan

2009-01-01

We propose a robust framework for clustering data. In practice, data obtained from real measurement devices can be incomplete, corrupted by gross errors, or not correspond to any assumed model. We show that, by properly harnessing the intrinsic low-dimensional structure of the data, these kinds of practical problems can be dealt with in a uniform…
Sparse Regression as a Sparse Eigenvalue Problem

NASA Technical Reports Server (NTRS)

Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai

2008-01-01

We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

PubMed Central

Liu, Weidong; Luo, Xi

2014-01-01

This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset. PMID:25750463
Duke Workshop on High-Dimensional Data Sensing and Analysis

DTIC Science & Technology

2015-05-06

Bayesian sparse factor analysis formulation of Chen et al . ( 2011 ) this work develops multi-label PCA (MLPCA), a generative dimension reduction...version of this problem was recently treated by Banerjee et al . [1], Ravikumar et al . [2], Kolar and Xing [3], and Ho ̈fling and Tibshirani [4]. As...Not applicable. Final Report Duke Workshop on High-Dimensional Data Sensing and Analysis Workshop Dates: July 26-28, 2011
Highly undersampled contrast-enhanced MRA with iterative reconstruction: Integration in a clinical setting.

PubMed

Stalder, Aurelien F; Schmidt, Michaela; Quick, Harald H; Schlamann, Marc; Maderwald, Stefan; Schmitt, Peter; Wang, Qiu; Nadar, Mariappan S; Zenge, Michael O

2015-12-01

To integrate, optimize, and evaluate a three-dimensional (3D) contrast-enhanced sparse MRA technique with iterative reconstruction on a standard clinical MR system. Data were acquired using a highly undersampled Cartesian spiral phyllotaxis sampling pattern and reconstructed directly on the MR system with an iterative SENSE technique. Undersampling, regularization, and number of iterations of the reconstruction were optimized and validated based on phantom experiments and patient data. Sparse MRA of the whole head (field of view: 265 × 232 × 179 mm(3) ) was investigated in 10 patient examinations. High-quality images with 30-fold undersampling, resulting in 0.7 mm isotropic resolution within 10 s acquisition, were obtained. After optimization of the regularization factor and of the number of iterations of the reconstruction, it was possible to reconstruct images with excellent quality within six minutes per 3D volume. Initial results of sparse contrast-enhanced MRA (CEMRA) in 10 patients demonstrated high-quality whole-head first-pass MRA for both the arterial and venous contrast phases. While sparse MRI techniques have not yet reached clinical routine, this study demonstrates the technical feasibility of high-quality sparse CEMRA of the whole head in a clinical setting. Sparse CEMRA has the potential to become a viable alternative where conventional CEMRA is too slow or does not provide sufficient spatial resolution. © 2014 Wiley Periodicals, Inc.
Parallel iterative methods for sparse linear and nonlinear equations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
Neural network analysis of Charpy transition temperature of irradiated low-activation martensitic steels

NASA Astrophysics Data System (ADS)

Cottrell, G. A.; Kemp, R.; Bhadeshia, H. K. D. H.; Odette, G. R.; Yamamoto, T.

2007-08-01

We have constructed a Bayesian neural network model that predicts the change, due to neutron irradiation, of the Charpy ductile-brittle transition temperature (ΔDBTT) of low-activation martensitic steels given a set of multi-dimensional published data with doses <100 displacements per atom (dpa). Results show the high significance of irradiation temperature and (dpa) 1/2 in determining ΔDBTT. Sparse data regions were identified by the size of the modelling uncertainties, indicating areas where further experimental data are needed. The method has promise for selecting and ranking experiments on future irradiation materials test facilities.
Parameter Estimation for a Turbulent Buoyant Jet Using Approximate Bayesian Computation

NASA Astrophysics Data System (ADS)

Christopher, Jason D.; Wimer, Nicholas T.; Hayden, Torrey R. S.; Lapointe, Caelan; Grooms, Ian; Rieker, Gregory B.; Hamlington, Peter E.

2016-11-01

Approximate Bayesian Computation (ABC) is a powerful tool that allows sparse experimental or other "truth" data to be used for the prediction of unknown model parameters in numerical simulations of real-world engineering systems. In this presentation, we introduce the ABC approach and then use ABC to predict unknown inflow conditions in simulations of a two-dimensional (2D) turbulent, high-temperature buoyant jet. For this test case, truth data are obtained from a simulation with known boundary conditions and problem parameters. Using spatially-sparse temperature statistics from the 2D buoyant jet truth simulation, we show that the ABC method provides accurate predictions of the true jet inflow temperature. The success of the ABC approach in the present test suggests that ABC is a useful and versatile tool for engineering fluid dynamics research.
Estimation of High-Dimensional Graphical Models Using Regularized Score Matching

PubMed Central

Lin, Lina; Drton, Mathias; Shojaie, Ali

2017-01-01

Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The regularized score matching method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under ℓ1 regularization. Under suitable irrepresentability conditions, we show that ℓ1-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models. PMID:28638498
Three-dimensional reconstruction of highly complex microscopic samples using scanning electron microscopy and optical flow estimation.

PubMed

Baghaie, Ahmadreza; Pahlavan Tafti, Ahmad; Owen, Heather A; D'Souza, Roshan M; Yu, Zeyun

2017-01-01

Scanning Electron Microscope (SEM) as one of the major research and industrial equipment for imaging of micro-scale samples and surfaces has gained extensive attention from its emerge. However, the acquired micrographs still remain two-dimensional (2D). In the current work a novel and highly accurate approach is proposed to recover the hidden third-dimension by use of multi-view image acquisition of the microscopic samples combined with pre/post-processing steps including sparse feature-based stereo rectification, nonlocal-based optical flow estimation for dense matching and finally depth estimation. Employing the proposed approach, three-dimensional (3D) reconstructions of highly complex microscopic samples were achieved to facilitate the interpretation of topology and geometry of surface/shape attributes of the samples. As a byproduct of the proposed approach, high-definition 3D printed models of the samples can be generated as a tangible means of physical understanding. Extensive comparisons with the state-of-the-art reveal the strength and superiority of the proposed method in uncovering the details of the highly complex microscopic samples.
Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits.

PubMed

Lu, Ake Tzu-Hui; Austin, Erin; Bonner, Ashley; Huang, Hsin-Hsiung; Cantor, Rita M

2014-09-01

Machine learning methods (MLMs), designed to develop models using high-dimensional predictors, have been used to analyze genome-wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single-nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy-based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high-dimensional data. © 2014 WILEY PERIODICALS, INC.
Sparse 4D TomoSAR imaging in the presence of non-linear deformation

NASA Astrophysics Data System (ADS)

Khwaja, Ahmed Shaharyar; ćetin, Müjdat

2018-04-01

In this paper, we present a sparse four-dimensional tomographic synthetic aperture radar (4D TomoSAR) imaging scheme that can estimate elevation and linear as well as non-linear seasonal deformation rates of scatterers using the interferometric phase. Unlike existing sparse processing techniques that use fixed dictionaries based on a linear deformation model, we use a variable dictionary for the non-linear deformation in the form of seasonal sinusoidal deformation, in addition to the fixed dictionary for the linear deformation. We estimate the amplitude of the sinusoidal deformation using an optimization method and create the variable dictionary using the estimated amplitude. We show preliminary results using simulated data that demonstrate the soundness of our proposed technique for sparse 4D TomoSAR imaging in the presence of non-linear deformation.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.

PubMed

Jiang, Dingfeng; Huang, Jian; Zhang, Ying

2013-10-01

We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

A sparse reconstruction method for the estimation of multi-resolution emission fields via atmospheric inversion

DOE PAGES

Ray, J.; Lee, J.; Yadav, V.; ...

2015-04-29

Atmospheric inversions are frequently used to estimate fluxes of atmospheric greenhouse gases (e.g., biospheric CO 2 flux fields) at Earth's surface. These inversions typically assume that flux departures from a prior model are spatially smoothly varying, which are then modeled using a multi-variate Gaussian. When the field being estimated is spatially rough, multi-variate Gaussian models are difficult to construct and a wavelet-based field model may be more suitable. Unfortunately, such models are very high dimensional and are most conveniently used when the estimation method can simultaneously perform data-driven model simplification (removal of model parameters that cannot be reliably estimated) andmore » fitting. Such sparse reconstruction methods are typically not used in atmospheric inversions. In this work, we devise a sparse reconstruction method, and illustrate it in an idealized atmospheric inversion problem for the estimation of fossil fuel CO 2 (ffCO 2) emissions in the lower 48 states of the USA. Our new method is based on stagewise orthogonal matching pursuit (StOMP), a method used to reconstruct compressively sensed images. Our adaptations bestow three properties to the sparse reconstruction procedure which are useful in atmospheric inversions. We have modified StOMP to incorporate prior information on the emission field being estimated and to enforce non-negativity on the estimated field. Finally, though based on wavelets, our method allows for the estimation of fields in non-rectangular geometries, e.g., emission fields inside geographical and political boundaries. Our idealized inversions use a recently developed multi-resolution (i.e., wavelet-based) random field model developed for ffCO 2 emissions and synthetic observations of ffCO 2 concentrations from a limited set of measurement sites. We find that our method for limiting the estimated field within an irregularly shaped region is about a factor of 10 faster than conventional approaches. It also reduces the overall computational cost by a factor of 2. Further, the sparse reconstruction scheme imposes non-negativity without introducing strong nonlinearities, such as those introduced by employing log-transformed fields, and thus reaps the benefits of simplicity and computational speed that are characteristic of linear inverse problems.« less
A sparse reconstruction method for the estimation of multi-resolution emission fields via atmospheric inversion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ray, J.; Lee, J.; Yadav, V.

Atmospheric inversions are frequently used to estimate fluxes of atmospheric greenhouse gases (e.g., biospheric CO 2 flux fields) at Earth's surface. These inversions typically assume that flux departures from a prior model are spatially smoothly varying, which are then modeled using a multi-variate Gaussian. When the field being estimated is spatially rough, multi-variate Gaussian models are difficult to construct and a wavelet-based field model may be more suitable. Unfortunately, such models are very high dimensional and are most conveniently used when the estimation method can simultaneously perform data-driven model simplification (removal of model parameters that cannot be reliably estimated) andmore » fitting. Such sparse reconstruction methods are typically not used in atmospheric inversions. In this work, we devise a sparse reconstruction method, and illustrate it in an idealized atmospheric inversion problem for the estimation of fossil fuel CO 2 (ffCO 2) emissions in the lower 48 states of the USA. Our new method is based on stagewise orthogonal matching pursuit (StOMP), a method used to reconstruct compressively sensed images. Our adaptations bestow three properties to the sparse reconstruction procedure which are useful in atmospheric inversions. We have modified StOMP to incorporate prior information on the emission field being estimated and to enforce non-negativity on the estimated field. Finally, though based on wavelets, our method allows for the estimation of fields in non-rectangular geometries, e.g., emission fields inside geographical and political boundaries. Our idealized inversions use a recently developed multi-resolution (i.e., wavelet-based) random field model developed for ffCO 2 emissions and synthetic observations of ffCO 2 concentrations from a limited set of measurement sites. We find that our method for limiting the estimated field within an irregularly shaped region is about a factor of 10 faster than conventional approaches. It also reduces the overall computational cost by a factor of 2. Further, the sparse reconstruction scheme imposes non-negativity without introducing strong nonlinearities, such as those introduced by employing log-transformed fields, and thus reaps the benefits of simplicity and computational speed that are characteristic of linear inverse problems.« less
Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

NASA Astrophysics Data System (ADS)

Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

2017-12-01

Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Detecting Shielded Special Nuclear Materials Using Multi-Dimensional Neutron Source and Detector Geometries

NASA Astrophysics Data System (ADS)

Santarius, John; Navarro, Marcos; Michalak, Matthew; Fancher, Aaron; Kulcinski, Gerald; Bonomo, Richard

2016-10-01

A newly initiated research project will be described that investigates methods for detecting shielded special nuclear materials by combining multi-dimensional neutron sources, forward/adjoint calculations modeling neutron and gamma transport, and sparse data analysis of detector signals. The key tasks for this project are: (1) developing a radiation transport capability for use in optimizing adaptive-geometry, inertial-electrostatic confinement (IEC) neutron source/detector configurations for neutron pulses distributed in space and/or phased in time; (2) creating distributed-geometry, gas-target, IEC fusion neutron sources; (3) applying sparse data and noise reduction algorithms, such as principal component analysis (PCA) and wavelet transform analysis, to enhance detection fidelity; and (4) educating graduate and undergraduate students. Funded by DHS DNDO Project 2015-DN-077-ARI095.
Addressing global uncertainty and sensitivity in first-principles based microkinetic models by an adaptive sparse grid approach

NASA Astrophysics Data System (ADS)

Döpking, Sandra; Plaisance, Craig P.; Strobusch, Daniel; Reuter, Karsten; Scheurer, Christoph; Matera, Sebastian

2018-01-01

In the last decade, first-principles-based microkinetic modeling has been developed into an important tool for a mechanistic understanding of heterogeneous catalysis. A commonly known, but hitherto barely analyzed issue in this kind of modeling is the presence of sizable errors from the use of approximate Density Functional Theory (DFT). We here address the propagation of these errors to the catalytic turnover frequency (TOF) by global sensitivity and uncertainty analysis. Both analyses require the numerical quadrature of high-dimensional integrals. To achieve this efficiently, we utilize and extend an adaptive sparse grid approach and exploit the confinement of the strongly non-linear behavior of the TOF to local regions of the parameter space. We demonstrate the methodology on a model of the oxygen evolution reaction at the Co3O4 (110)-A surface, using a maximum entropy error model that imposes nothing but reasonable bounds on the errors. For this setting, the DFT errors lead to an absolute uncertainty of several orders of magnitude in the TOF. We nevertheless find that it is still possible to draw conclusions from such uncertain models about the atomistic aspects controlling the reactivity. A comparison with derivative-based local sensitivity analysis instead reveals that this more established approach provides incomplete information. Since the adaptive sparse grids allow for the evaluation of the integrals with only a modest number of function evaluations, this approach opens the way for a global sensitivity analysis of more complex models, for instance, models based on kinetic Monte Carlo simulations.
DEM generation from contours and a low-resolution DEM

NASA Astrophysics Data System (ADS)

Li, Xinghua; Shen, Huanfeng; Feng, Ruitao; Li, Jie; Zhang, Liangpei

2017-12-01

A digital elevation model (DEM) is a virtual representation of topography, where the terrain is established by the three-dimensional co-ordinates. In the framework of sparse representation, this paper investigates DEM generation from contours. Since contours are usually sparsely distributed and closely related in space, sparse spatial regularization (SSR) is enforced on them. In order to make up for the lack of spatial information, another lower spatial resolution DEM from the same geographical area is introduced. In this way, the sparse representation implements the spatial constraints in the contours and extracts the complementary information from the auxiliary DEM. Furthermore, the proposed method integrates the advantage of the unbiased estimation of kriging. For brevity, the proposed method is called the kriging and sparse spatial regularization (KSSR) method. The performance of the proposed KSSR method is demonstrated by experiments in Shuttle Radar Topography Mission (SRTM) 30 m DEM and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) 30 m global digital elevation model (GDEM) generation from the corresponding contours and a 90 m DEM. The experiments confirm that the proposed KSSR method outperforms the traditional kriging and SSR methods, and it can be successfully used for DEM generation from contours.
Compressive sensing for sparse time-frequency representation of nonstationary signals in the presence of impulsive noise

NASA Astrophysics Data System (ADS)

Orović, Irena; Stanković, Srdjan; Amin, Moeness

2013-05-01

A modified robust two-dimensional compressive sensing algorithm for reconstruction of sparse time-frequency representation (TFR) is proposed. The ambiguity function domain is assumed to be the domain of observations. The two-dimensional Fourier bases are used to linearly relate the observations to the sparse TFR, in lieu of the Wigner distribution. We assume that a set of available samples in the ambiguity domain is heavily corrupted by an impulsive type of noise. Consequently, the problem of sparse TFR reconstruction cannot be tackled using standard compressive sensing optimization algorithms. We introduce a two-dimensional L-statistics based modification into the transform domain representation. It provides suitable initial conditions that will produce efficient convergence of the reconstruction algorithm. This approach applies sorting and weighting operations to discard an expected amount of samples corrupted by noise. The remaining samples serve as observations used in sparse reconstruction of the time-frequency signal representation. The efficiency of the proposed approach is demonstrated on numerical examples that comprise both cases of monocomponent and multicomponent signals.
Assimilation of spatially sparse in situ soil moisture networks into a continuous model domain

USDA-ARS?s Scientific Manuscript database

Growth in the availability of near-real-time soil moisture observations from ground-based networks has spurred interest in the assimilation of these observations into land surface models via a two-dimensional data assimilation system. However, the design of such systems is currently hampered by our ...
Gradient-based optimization with B-splines on sparse grids for solving forward-dynamics simulations of three-dimensional, continuum-mechanical musculoskeletal system models.

PubMed

Valentin, J; Sprenger, M; Pflüger, D; Röhrle, O

2018-05-01

Investigating the interplay between muscular activity and motion is the basis to improve our understanding of healthy or diseased musculoskeletal systems. To be able to analyze the musculoskeletal systems, computational models are used. Albeit some severe modeling assumptions, almost all existing musculoskeletal system simulations appeal to multibody simulation frameworks. Although continuum-mechanical musculoskeletal system models can compensate for some of these limitations, they are essentially not considered because of their computational complexity and cost. The proposed framework is the first activation-driven musculoskeletal system model, in which the exerted skeletal muscle forces are computed using 3-dimensional, continuum-mechanical skeletal muscle models and in which muscle activations are determined based on a constraint optimization problem. Numerical feasibility is achieved by computing sparse grid surrogates with hierarchical B-splines, and adaptive sparse grid refinement further reduces the computational effort. The choice of B-splines allows the use of all existing gradient-based optimization techniques without further numerical approximation. This paper demonstrates that the resulting surrogates have low relative errors (less than 0.76%) and can be used within forward simulations that are subject to constraint optimization. To demonstrate this, we set up several different test scenarios in which an upper limb model consisting of the elbow joint, the biceps and triceps brachii, and an external load is subjected to different optimization criteria. Even though this novel method has only been demonstrated for a 2-muscle system, it can easily be extended to musculoskeletal systems with 3 or more muscles. Copyright © 2018 John Wiley & Sons, Ltd.
The Impact of Parametric Uncertainties on Biogeochemistry in the E3SM Land Model

NASA Astrophysics Data System (ADS)

Ricciuto, Daniel; Sargsyan, Khachik; Thornton, Peter

2018-02-01

We conduct a global sensitivity analysis (GSA) of the Energy Exascale Earth System Model (E3SM), land model (ELM) to calculate the sensitivity of five key carbon cycle outputs to 68 model parameters. This GSA is conducted by first constructing a Polynomial Chaos (PC) surrogate via new Weighted Iterative Bayesian Compressive Sensing (WIBCS) algorithm for adaptive basis growth leading to a sparse, high-dimensional PC surrogate with 3,000 model evaluations. The PC surrogate allows efficient extraction of GSA information leading to further dimensionality reduction. The GSA is performed at 96 FLUXNET sites covering multiple plant functional types (PFTs) and climate conditions. About 20 of the model parameters are identified as sensitive with the rest being relatively insensitive across all outputs and PFTs. These sensitivities are dependent on PFT, and are relatively consistent among sites within the same PFT. The five model outputs have a majority of their highly sensitive parameters in common. A common subset of sensitive parameters is also shared among PFTs, but some parameters are specific to certain types (e.g., deciduous phenology). The relative importance of these parameters shifts significantly among PFTs and with climatic variables such as mean annual temperature.
Accessing Multi-Dimensional Images and Data Cubes in the Virtual Observatory

NASA Astrophysics Data System (ADS)

Tody, Douglas; Plante, R. L.; Berriman, G. B.; Cresitello-Dittmar, M.; Good, J.; Graham, M.; Greene, G.; Hanisch, R. J.; Jenness, T.; Lazio, J.; Norris, P.; Pevunova, O.; Rots, A. H.

2014-01-01

Telescopes across the spectrum are routinely producing multi-dimensional images and datasets, such as Doppler velocity cubes, polarization datasets, and time-resolved “movies.” Examples of current telescopes producing such multi-dimensional images include the JVLA, ALMA, and the IFU instruments on large optical and near-infrared wavelength telescopes. In the near future, both the LSST and JWST will also produce such multi-dimensional images routinely. High-energy instruments such as Chandra produce event datasets that are also a form of multi-dimensional data, in effect being a very sparse multi-dimensional image. Ensuring that the data sets produced by these telescopes can be both discovered and accessed by the community is essential and is part of the mission of the Virtual Observatory (VO). The Virtual Astronomical Observatory (VAO, http://www.usvao.org/), in conjunction with its international partners in the International Virtual Observatory Alliance (IVOA), has developed a protocol and an initial demonstration service designed for the publication, discovery, and access of arbitrarily large multi-dimensional images. The protocol describing multi-dimensional images is the Simple Image Access Protocol, version 2, which provides the minimal set of metadata required to characterize a multi-dimensional image for its discovery and access. A companion Image Data Model formally defines the semantics and structure of multi-dimensional images independently of how they are serialized, while providing capabilities such as support for sparse data that are essential to deal effectively with large cubes. A prototype data access service has been deployed and tested, using a suite of multi-dimensional images from a variety of telescopes. The prototype has demonstrated the capability to discover and remotely access multi-dimensional data via standard VO protocols. The prototype informs the specification of a protocol that will be submitted to the IVOA for approval, with an operational data cube service to be delivered in mid-2014. An associated user-installable VO data service framework will provide the capabilities required to publish VO-compatible multi-dimensional images or data cubes.
Sampling design for groundwater solute transport: Tests of methods and analysis of Cape Cod tracer test data

USGS Publications Warehouse

Knopman, Debra S.; Voss, Clifford I.; Garabedian, Stephen P.

1991-01-01

Tests of a one-dimensional sampling design methodology on measurements of bromide concentration collected during the natural gradient tracer test conducted by the U.S. Geological Survey on Cape Cod, Massachusetts, demonstrate its efficacy for field studies of solute transport in groundwater and the utility of one-dimensional analysis. The methodology was applied to design of sparse two-dimensional networks of fully screened wells typical of those often used in engineering practice. In one-dimensional analysis, designs consist of the downstream distances to rows of wells oriented perpendicular to the groundwater flow direction and the timing of sampling to be carried out on each row. The power of a sampling design is measured by its effectiveness in simultaneously meeting objectives of model discrimination, parameter estimation, and cost minimization. One-dimensional models of solute transport, differing in processes affecting the solute and assumptions about the structure of the flow field, were considered for description of tracer cloud migration. When fitting each model using nonlinear regression, additive and multiplicative error forms were allowed for the residuals which consist of both random and model errors. The one-dimensional single-layer model of a nonreactive solute with multiplicative error was judged to be the best of those tested. Results show the efficacy of the methodology in designing sparse but powerful sampling networks. Designs that sample five rows of wells at five or fewer times in any given row performed as well for model discrimination as the full set of samples taken up to eight times in a given row from as many as 89 rows. Also, designs for parameter estimation judged to be good by the methodology were as effective in reducing the variance of parameter estimates as arbitrary designs with many more samples. Results further showed that estimates of velocity and longitudinal dispersivity in one-dimensional models based on data from only five rows of fully screened wells each sampled five or fewer times were practically equivalent to values determined from moments analysis of the complete three-dimensional set of 29,285 samples taken during 16 sampling times.
Dictionary-Based Tensor Canonical Polyadic Decomposition

NASA Astrophysics Data System (ADS)

Cohen, Jeremy Emile; Gillis, Nicolas

2018-04-01

To ensure interpretability of extracted sources in tensor decomposition, we introduce in this paper a dictionary-based tensor canonical polyadic decomposition which enforces one factor to belong exactly to a known dictionary. A new formulation of sparse coding is proposed which enables high dimensional tensors dictionary-based canonical polyadic decomposition. The benefits of using a dictionary in tensor decomposition models are explored both in terms of parameter identifiability and estimation accuracy. Performances of the proposed algorithms are evaluated on the decomposition of simulated data and the unmixing of hyperspectral images.
State estimation and prediction using clustered particle filters.

PubMed

Lee, Yoonsang; Majda, Andrew J

2016-12-20

Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
State estimation and prediction using clustered particle filters

PubMed Central

Lee, Yoonsang; Majda, Andrew J.

2016-01-01

Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
Energy budgets and resistances to energy transport in sparsely vegetated rangeland

USGS Publications Warehouse

Nichols, W.D.

1992-01-01

Partitioning available energy between plants and bare soil in sparsely vegetated rangelands will allow hydrologists and others to gain a greater understanding of water use by native vegetation, especially phreatophytes. Standard methods of conducting energy budget studies result in measurements of latent and sensible heat fluxes above the plant canopy which therefore include the energy fluxes from both the canopy and the soil. One-dimensional theoretical numerical models have been proposed recently for the partitioning of energy in sparse crops. Bowen ratio and other micrometeorological data collected over phreatophytes growing in areas of shallow ground water in central Nevada were used to evaluate the feasibility of using these models, which are based on surface and within-canopy aerodynamic resistances, to determine heat and water vapor transport in sparsely vegetated rangelands. The models appear to provide reasonably good estimates of sensible heat flux from the soil and latent heat flux from the canopy. Estimates of latent heat flux from the soil were less satisfactory. Sensible heat flux from the canopy was not well predicted by the present resistance formulations. Also, estimates of total above-canopy fluxes were not satisfactory when using a single value for above-canopy bulk aerodynamic resistance. ?? 1992.
SPARSE-A subgrid particle averaged Reynolds stress equivalent model: testing with a priori closure.

PubMed

Davis, Sean L; Jacobs, Gustaaf B; Sen, Oishik; Udaykumar, H S

2017-03-01

A Lagrangian particle cloud model is proposed that accounts for the effects of Reynolds-averaged particle and turbulent stresses and the averaged carrier-phase velocity of the subparticle cloud scale on the averaged motion and velocity of the cloud. The SPARSE (subgrid particle averaged Reynolds stress equivalent) model is based on a combination of a truncated Taylor expansion of a drag correction function and Reynolds averaging. It reduces the required number of computational parcels to trace a cloud of particles in Eulerian-Lagrangian methods for the simulation of particle-laden flow. Closure is performed in an a priori manner using a reference simulation where all particles in the cloud are traced individually with a point-particle model. Comparison of a first-order model and SPARSE with the reference simulation in one dimension shows that both the stress and the averaging of the carrier-phase velocity on the cloud subscale affect the averaged motion of the particle. A three-dimensional isotropic turbulence computation shows that only one computational parcel is sufficient to accurately trace a cloud of tens of thousands of particles.
Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

PubMed Central

Huang, Jian; Zhang, Cun-Hui

2013-01-01

The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.

PubMed

Fan, Jianqing; Xue, Lingzhou; Zou, Hui

2014-06-01

Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION

PubMed Central

Fan, Jianqing; Xue, Lingzhou; Zou, Hui

2014-01-01

Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560

Sparse partial least squares regression for simultaneous dimension reduction and variable selection

PubMed Central

Chun, Hyonho; Keleş, Sündüz

2010-01-01

Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Nested sparse grid collocation method with delay and transformation for subsurface flow and transport problems

NASA Astrophysics Data System (ADS)

Liao, Qinzhuo; Zhang, Dongxiao; Tchelepi, Hamdi

2017-06-01

In numerical modeling of subsurface flow and transport problems, formation properties may not be deterministically characterized, which leads to uncertainty in simulation results. In this study, we propose a sparse grid collocation method, which adopts nested quadrature rules with delay and transformation to quantify the uncertainty of model solutions. We show that the nested Kronrod-Patterson-Hermite quadrature is more efficient than the unnested Gauss-Hermite quadrature. We compare the convergence rates of various quadrature rules including the domain truncation and domain mapping approaches. To further improve accuracy and efficiency, we present a delayed process in selecting quadrature nodes and a transformed process for approximating unsmooth or discontinuous solutions. The proposed method is tested by an analytical function and in one-dimensional single-phase and two-phase flow problems with different spatial variances and correlation lengths. An additional example is given to demonstrate its applicability to three-dimensional black-oil models. It is found from these examples that the proposed method provides a promising approach for obtaining satisfactory estimation of the solution statistics and is much more efficient than the Monte-Carlo simulations.
Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics

NASA Technical Reports Server (NTRS)

Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.

2001-01-01

An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.
Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.

PubMed

Okimoto, Gordon; Zeinalzadeh, Ashkan; Wenska, Tom; Loomis, Michael; Nation, James B; Fabre, Tiphaine; Tiirikainen, Maarit; Hernandez, Brenda; Chan, Owen; Wong, Linda; Kwee, Sandi

2016-01-01

Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single "sparsity" parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on "residual" data matrices that result from a given sparse approximation. We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.
Low-rank factorization of electron integral tensors and its application in electronic structure theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peng, Bo; Kowalski, Karol

In this letter, we introduce the reverse Cuthill-McKee (RCM) algorithm, which is often used for the bandwidth reduction of sparse tensors, to transform the two-electron integral tensors to their block diagonal forms. By further applying the pivoted Cholesky decomposition (CD) on each of the diagonal blocks, we are able to represent the high-dimensional two-electron integral tensors in terms of permutation matrices and low-rank Cholesky vectors. This representation facilitates the low-rank factorization of the high-dimensional tensor contractions that are usually encountered in post-Hartree-Fock calculations. In this letter, we discuss the second-order Møller-Plesset (MP2) method and linear coupled- cluster model with doublesmore » (L-CCD) as two simple examples to demonstrate the efficiency of the RCM-CD technique in representing two-electron integrals in a compact form.« less
Influence plots for LASSO

DOE PAGES

Jang, Dae -Heung; Anderson-Cook, Christine Michaela

2016-11-22

With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
Influence plots for LASSO

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jang, Dae -Heung; Anderson-Cook, Christine Michaela

With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
High dimensional linear regression models under long memory dependence and measurement error

NASA Astrophysics Data System (ADS)

Kaul, Abhishek

This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.
Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease

PubMed Central

Liu, Hongcheng; Du, Guangwei; Zhang, Lijun; Lewis, Mechelle M.; Wang, Xue; Yao, Tao; Li, Runze; Huang, Xuemei

2016-01-01

Background Brain MRI holds promise to gauge different aspects of Parkinson’s disease (PD)-related pathological changes. Its analysis, however, is hindered by the high-dimensional nature of the data. New method This study introduces folded concave penalized (FCP) sparse logistic regression to identify biomarkers for PD from a large number of potential factors. The proposed statistical procedures target the challenges of high-dimensionality with limited data samples acquired. The maximization problem associated with the sparse logistic regression model is solved by local linear approximation. The proposed procedures then are applied to the empirical analysis of multimodal MRI data. Results From 45 features, the proposed approach identified 15 MRI markers and the UPSIT, which are known to be clinically relevant to PD. By combining the MRI and clinical markers, we can enhance substantially the specificity and sensitivity of the model, as indicated by the ROC curves. Comparison to existing methods We compare the folded concave penalized learning scheme with both the Lasso penalized scheme and the principle component analysis-based feature selection (PCA) in the Parkinson’s biomarker identification problem that takes into account both the clinical features and MRI markers. The folded concave penalty method demonstrates a substantially better clinical potential than both the Lasso and PCA in terms of specificity and sensitivity. Conclusions For the first time, we applied the FCP learning method to MRI biomarker discovery in PD. The proposed approach successfully identified MRI markers that are clinically relevant. Combining these biomarkers with clinical features can substantially enhance performance. PMID:27102045
Rapid determination of biogenic amines in cooked beef using hyperspectral imaging with sparse representation algorithm

NASA Astrophysics Data System (ADS)

Yang, Dong; Lu, Anxiang; Ren, Dong; Wang, Jihua

2017-11-01

This study explored the feasibility of rapid detection of biogenic amines (BAs) in cooked beef during the storage process using hyperspectral imaging technique combined with sparse representation (SR) algorithm. The hyperspectral images of samples were collected in the two spectral ranges of 400-1000 nm and 1000-1800 nm, separately. The spectral data were reduced dimensionality by SR and principal component analysis (PCA) algorithms, and then integrated the least square support vector machine (LS-SVM) to build the SR-LS-SVM and PC-LS-SVM models for the prediction of BAs values in cooked beef. The results showed that the SR-LS-SVM model exhibited the best predictive ability with determination coefficients (RP2) of 0.943 and root mean square errors (RMSEP) of 1.206 in the range of 400-1000 nm of prediction set. The SR and PCA algorithms were further combined to establish the best SR-PC-LS-SVM model for BAs prediction, which had high RP2of 0.969 and low RMSEP of 1.039 in the region of 400-1000 nm. The visual map of the BAs was generated using the best SR-PC-LS-SVM model with imaging process algorithms, which could be used to observe the changes of BAs in cooked beef more intuitively. The study demonstrated that hyperspectral imaging technique combined with sparse representation were able to detect effectively the BAs values in cooked beef during storage and the built SR-PC-LS-SVM model had a potential for rapid and accurate determination of freshness indexes in other meat and meat products.
Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study.

PubMed

Oh, Ein; Yoo, Tae Keun; Park, Eun-Cheol

2013-09-13

Blindness due to diabetic retinopathy (DR) is the major disability in diabetic patients. Although early management has shown to prevent vision loss, diabetic patients have a low rate of routine ophthalmologic examination. Hence, we developed and validated sparse learning models with the aim of identifying the risk of DR in diabetic patients. Health records from the Korea National Health and Nutrition Examination Surveys (KNHANES) V-1 were used. The prediction models for DR were constructed using data from 327 diabetic patients, and were validated internally on 163 patients in the KNHANES V-1. External validation was performed using 562 diabetic patients in the KNHANES V-2. The learning models, including ridge, elastic net, and LASSO, were compared to the traditional indicators of DR. Considering the Bayesian information criterion, LASSO predicted DR most efficiently. In the internal and external validation, LASSO was significantly superior to the traditional indicators by calculating the area under the curve (AUC) of the receiver operating characteristic. LASSO showed an AUC of 0.81 and an accuracy of 73.6% in the internal validation, and an AUC of 0.82 and an accuracy of 75.2% in the external validation. The sparse learning model using LASSO was effective in analyzing the epidemiological underlying patterns of DR. This is the first study to develop a machine learning model to predict DR risk using health records. LASSO can be an excellent choice when both discriminative power and variable selection are important in the analysis of high-dimensional electronic health records.
Technical note: an R package for fitting sparse neural networks with application in animal breeding.

PubMed

Wang, Yangfan; Mi, Xue; Rosa, Guilherme J M; Chen, Zhihui; Lin, Ping; Wang, Shi; Bao, Zhenmin

2018-05-04

Neural networks (NNs) have emerged as a new tool for genomic selection (GS) in animal breeding. However, the properties of NN used in GS for the prediction of phenotypic outcomes are not well characterized due to the problem of over-parameterization of NN and difficulties in using whole-genome marker sets as high-dimensional NN input. In this note, we have developed an R package called snnR that finds an optimal sparse structure of a NN by minimizing the square error subject to a penalty on the L1-norm of the parameters (weights and biases), therefore solving the problem of over-parameterization in NN. We have also tested some models fitted in the snnR package to demonstrate their feasibility and effectiveness to be used in several cases as examples. In comparison of snnR to the R package brnn (the Bayesian regularized single layer NNs), with both using the entries of a genotype matrix or a genomic relationship matrix as inputs, snnR has greatly improved the computational efficiency and the prediction ability for the GS in animal breeding because snnR implements a sparse NN with many hidden layers.
Compressed digital holography: from micro towards macro

NASA Astrophysics Data System (ADS)

Schretter, Colas; Bettens, Stijn; Blinder, David; Pesquet-Popescu, Béatrice; Cagnazzo, Marco; Dufaux, Frédéric; Schelkens, Peter

2016-09-01

signal processing methods from software-driven computer engineering and applied mathematics. The compressed sensing theory in particular established a practical framework for reconstructing the scene content using few linear combinations of complex measurements and a sparse prior for regularizing the solution. Compressed sensing found direct applications in digital holography for microscopy. Indeed, the wave propagation phenomenon in free space mixes in a natural way the spatial distribution of point sources from the 3-dimensional scene. As the 3-dimensional scene is mapped to a 2-dimensional hologram, the hologram samples form a compressed representation of the scene as well. This overview paper discusses contributions in the field of compressed digital holography at the micro scale. Then, an outreach on future extensions towards the real-size macro scale is discussed. Thanks to advances in sensor technologies, increasing computing power and the recent improvements in sparse digital signal processing, holographic modalities are on the verge of practical high-quality visualization at a macroscopic scale where much higher resolution holograms must be acquired and processed on the computer.
Weighted Iterative Bayesian Compressive Sensing (WIBCS) for High Dimensional Polynomial Surrogate Construction

NASA Astrophysics Data System (ADS)

Sargsyan, K.; Ricciuto, D. M.; Safta, C.; Debusschere, B.; Najm, H. N.; Thornton, P. E.

2016-12-01

Surrogate construction has become a routine procedure when facing computationally intensive studies requiring multiple evaluations of complex models. In particular, surrogate models, otherwise called emulators or response surfaces, replace complex models in uncertainty quantification (UQ) studies, including uncertainty propagation (forward UQ) and parameter estimation (inverse UQ). Further, surrogates based on Polynomial Chaos (PC) expansions are especially convenient for forward UQ and global sensitivity analysis, also known as variance-based decomposition. However, the PC surrogate construction strongly suffers from the curse of dimensionality. With a large number of input parameters, the number of model simulations required for accurate surrogate construction is prohibitively large. Relatedly, non-adaptive PC expansions typically include infeasibly large number of basis terms far exceeding the number of available model evaluations. We develop Weighted Iterative Bayesian Compressive Sensing (WIBCS) algorithm for adaptive basis growth and PC surrogate construction leading to a sparse, high-dimensional PC surrogate with a very few model evaluations. The surrogate is then readily employed for global sensitivity analysis leading to further dimensionality reduction. Besides numerical tests, we demonstrate the construction on the example of Accelerated Climate Model for Energy (ACME) Land Model for several output QoIs at nearly 100 FLUXNET sites covering multiple plant functional types and climates, varying 65 input parameters over broad ranges of possible values. This work is supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research, Accelerated Climate Modeling for Energy (ACME) project. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
The application of a sparse, distributed memory to the detection, identification and manipulation of physical objects

NASA Technical Reports Server (NTRS)

Kanerva, P.

1986-01-01

To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
Learning dictionaries of sparse codes of 3D movements of body joints for real-time human activity understanding.

PubMed

Qi, Jin; Yang, Zhiyong

2014-01-01

Real-time human activity recognition is essential for human-robot interactions for assisted healthy independent living. Most previous work in this area is performed on traditional two-dimensional (2D) videos and both global and local methods have been used. Since 2D videos are sensitive to changes of lighting condition, view angle, and scale, researchers begun to explore applications of 3D information in human activity understanding in recently years. Unfortunately, features that work well on 2D videos usually don't perform well on 3D videos and there is no consensus on what 3D features should be used. Here we propose a model of human activity recognition based on 3D movements of body joints. Our method has three steps, learning dictionaries of sparse codes of 3D movements of joints, sparse coding, and classification. In the first step, space-time volumes of 3D movements of body joints are obtained via dense sampling and independent component analysis is then performed to construct a dictionary of sparse codes for each activity. In the second step, the space-time volumes are projected to the dictionaries and a set of sparse histograms of the projection coefficients are constructed as feature representations of the activities. Finally, the sparse histograms are used as inputs to a support vector machine to recognize human activities. We tested this model on three databases of human activities and found that it outperforms the state-of-the-art algorithms. Thus, this model can be used for real-time human activity recognition in many applications.
An efficient classification method based on principal component and sparse representation.

PubMed

Zhai, Lin; Fu, Shujun; Zhang, Caiming; Liu, Yunxian; Wang, Lu; Liu, Guohua; Yang, Mingqiang

2016-01-01

As an important application in optical imaging, palmprint recognition is interfered by many unfavorable factors. An effective fusion of blockwise bi-directional two-dimensional principal component analysis and grouping sparse classification is presented. The dimension reduction and normalizing are implemented by the blockwise bi-directional two-dimensional principal component analysis for palmprint images to extract feature matrixes, which are assembled into an overcomplete dictionary in sparse classification. A subspace orthogonal matching pursuit algorithm is designed to solve the grouping sparse representation. Finally, the classification result is gained by comparing the residual between testing and reconstructed images. Experiments are carried out on a palmprint database, and the results show that this method has better robustness against position and illumination changes of palmprint images, and can get higher rate of palmprint recognition.
Assimilation of Spatially Sparse In Situ Soil Moisture Networks into a Continuous Model Domain

NASA Astrophysics Data System (ADS)

Gruber, A.; Crow, W. T.; Dorigo, W. A.

2018-02-01

Growth in the availability of near-real-time soil moisture observations from ground-based networks has spurred interest in the assimilation of these observations into land surface models via a two-dimensional data assimilation system. However, the design of such systems is currently hampered by our ignorance concerning the spatial structure of error afflicting ground and model-based soil moisture estimates. Here we apply newly developed triple collocation techniques to provide the spatial error information required to fully parameterize a two-dimensional (2-D) data assimilation system designed to assimilate spatially sparse observations acquired from existing ground-based soil moisture networks into a spatially continuous Antecedent Precipitation Index (API) model for operational agricultural drought monitoring. Over the contiguous United States (CONUS), the posterior uncertainty of surface soil moisture estimates associated with this 2-D system is compared to that obtained from the 1-D assimilation of remote sensing retrievals to assess the value of ground-based observations to constrain a surface soil moisture analysis. Results demonstrate that a fourfold increase in existing CONUS ground station density is needed for ground network observations to provide a level of skill comparable to that provided by existing satellite-based surface soil moisture retrievals.
Artificial neural network does better spatiotemporal compressive sampling

NASA Astrophysics Data System (ADS)

Lee, Soo-Young; Hsu, Charles; Szu, Harold

2012-06-01

Spatiotemporal sparseness is generated naturally by human visual system based on artificial neural network modeling of associative memory. Sparseness means nothing more and nothing less than the compressive sensing achieves merely the information concentration. To concentrate the information, one uses the spatial correlation or spatial FFT or DWT or the best of all adaptive wavelet transform (cf. NUS, Shen Shawei). However, higher dimensional spatiotemporal information concentration, the mathematics can not do as flexible as a living human sensory system. The reason is obviously for survival reasons. The rest of the story is given in the paper.
Brain abnormality segmentation based on l1-norm minimization

NASA Astrophysics Data System (ADS)

Zeng, Ke; Erus, Guray; Tanwar, Manoj; Davatzikos, Christos

2014-03-01

We present a method that uses sparse representations to model the inter-individual variability of healthy anatomy from a limited number of normal medical images. Abnormalities in MR images are then defined as deviations from the normal variation. More precisely, we model an abnormal (pathological) signal y as the superposition of a normal part ~y that can be sparsely represented under an example-based dictionary, and an abnormal part r. Motivated by a dense error correction scheme recently proposed for sparse signal recovery, we use l1- norm minimization to separate ~y and r. We extend the existing framework, which was mainly used on robust face recognition in a discriminative setting, to address challenges of brain image analysis, particularly the high dimensionality and low sample size problem. The dictionary is constructed from local image patches extracted from training images aligned using smooth transformations, together with minor perturbations of those patches. A multi-scale sliding-window scheme is applied to capture anatomical variations ranging from fine and localized to coarser and more global. The statistical significance of the abnormality term r is obtained by comparison to its empirical distribution through cross-validation, and is used to assign an abnormality score to each voxel. In our validation experiments the method is applied for segmenting abnormalities on 2-D slices of FLAIR images, and we obtain segmentation results consistent with the expert-defined masks.

Inferring network structure in non-normal and mixed discrete-continuous genomic data.

PubMed

Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

2018-03-01

Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.
Inferring network structure in non-normal and mixed discrete-continuous genomic data

PubMed Central

Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

2017-01-01

Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. PMID:28437848
A modified sparse reconstruction method for three-dimensional synthetic aperture radar image

NASA Astrophysics Data System (ADS)

Zhang, Ziqiang; Ji, Kefeng; Song, Haibo; Zou, Huanxin

2018-03-01

There is an increasing interest in three-dimensional Synthetic Aperture Radar (3-D SAR) imaging from observed sparse scattering data. However, the existing 3-D sparse imaging method requires large computing times and storage capacity. In this paper, we propose a modified method for the sparse 3-D SAR imaging. The method processes the collection of noisy SAR measurements, usually collected over nonlinear flight paths, and outputs 3-D SAR imagery. Firstly, the 3-D sparse reconstruction problem is transformed into a series of 2-D slices reconstruction problem by range compression. Then the slices are reconstructed by the modified SL0 (smoothed l0 norm) reconstruction algorithm. The improved algorithm uses hyperbolic tangent function instead of the Gaussian function to approximate the l0 norm and uses the Newton direction instead of the steepest descent direction, which can speed up the convergence rate of the SL0 algorithm. Finally, numerical simulation results are given to demonstrate the effectiveness of the proposed algorithm. It is shown that our method, compared with existing 3-D sparse imaging method, performs better in reconstruction quality and the reconstruction time.
Fracture size and transmissivity correlations: Implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size

NASA Astrophysics Data System (ADS)

Hyman, J. D.; Aldrich, G.; Viswanathan, H.; Makedonska, N.; Karra, S.

2016-08-01

We characterize how different fracture size-transmissivity relationships influence flow and transport simulations through sparse three-dimensional discrete fracture networks. Although it is generally accepted that there is a positive correlation between a fracture's size and its transmissivity/aperture, the functional form of that relationship remains a matter of debate. Relationships that assume perfect correlation, semicorrelation, and noncorrelation between the two have been proposed. To study the impact that adopting one of these relationships has on transport properties, we generate multiple sparse fracture networks composed of circular fractures whose radii follow a truncated power law distribution. The distribution of transmissivities are selected so that the mean transmissivity of the fracture networks are the same and the distributions of aperture and transmissivity in models that include a stochastic term are also the same. We observe that adopting a correlation between a fracture size and its transmissivity leads to earlier breakthrough times and higher effective permeability when compared to networks where no correlation is used. While fracture network geometry plays the principal role in determining where transport occurs within the network, the relationship between size and transmissivity controls the flow speed. These observations indicate DFN modelers should be aware that breakthrough times and effective permeabilities can be strongly influenced by such a relationship in addition to fracture and network statistics.
Fracture size and transmissivity correlations: Implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size

NASA Astrophysics Data System (ADS)

Hyman, J.; Aldrich, G. A.; Viswanathan, H. S.; Makedonska, N.; Karra, S.

2016-12-01

We characterize how different fracture size-transmissivity relationships influence flow and transport simulations through sparse three-dimensional discrete fracture networks. Although it is generally accepted that there is a positive correlation between a fracture's size and its transmissivity/aperture, the functional form of that relationship remains a matter of debate. Relationships that assume perfect correlation, semi-correlation, and non-correlation between the two have been proposed. To study the impact that adopting one of these relationships has on transport properties, we generate multiple sparse fracture networks composed of circular fractures whose radii follow a truncated power law distribution. The distribution of transmissivities are selected so that the mean transmissivity of the fracture networks are the same and the distributions of aperture and transmissivity in models that include a stochastic term are also the same.We observe that adopting a correlation between a fracture size and its transmissivity leads to earlier breakthrough times and higher effective permeability when compared to networks where no correlation is used. While fracture network geometry plays the principal role in determining where transport occurs within the network, the relationship between size and transmissivity controls the flow speed. These observations indicate DFN modelers should be aware that breakthrough times and effective permeabilities can be strongly influenced by such a relationship in addition to fracture and network statistics.
Defect-Repairable Latent Feature Extraction of Driving Behavior via a Deep Sparse Autoencoder

PubMed Central

Taniguchi, Tadahiro; Takenaka, Kazuhito; Bando, Takashi

2018-01-01

Data representing driving behavior, as measured by various sensors installed in a vehicle, are collected as multi-dimensional sensor time-series data. These data often include redundant information, e.g., both the speed of wheels and the engine speed represent the velocity of the vehicle. Redundant information can be expected to complicate the data analysis, e.g., more factors need to be analyzed; even varying the levels of redundancy can influence the results of the analysis. We assume that the measured multi-dimensional sensor time-series data of driving behavior are generated from low-dimensional data shared by the many types of one-dimensional data of which multi-dimensional time-series data are composed. Meanwhile, sensor time-series data may be defective because of sensor failure. Therefore, another important function is to reduce the negative effect of defective data when extracting low-dimensional time-series data. This study proposes a defect-repairable feature extraction method based on a deep sparse autoencoder (DSAE) to extract low-dimensional time-series data. In the experiments, we show that DSAE provides high-performance latent feature extraction for driving behavior, even for defective sensor time-series data. In addition, we show that the negative effect of defects on the driving behavior segmentation task could be reduced using the latent features extracted by DSAE. PMID:29462931
The Impact of Parametric Uncertainties on Biogeochemistry in the E3SM Land Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ricciuto, Daniel; Sargsyan, Khachik; Thornton, Peter

We conduct a global sensitivity analysis (GSA) of the Energy Exascale Earth System Model (E3SM), land model (ELM) to calculate the sensitivity of five key carbon cycle outputs to 68 model parameters. This GSA is conducted by first constructing a Polynomial Chaos (PC) surrogate via new Weighted Iterative Bayesian Compressive Sensing (WIBCS) algorithm for adaptive basis growth leading to a sparse, high-dimensional PC surrogate with 3,000 model evaluations. The PC surrogate allows efficient extraction of GSA information leading to further dimensionality reduction. The GSA is performed at 96 FLUXNET sites covering multiple plant functional types (PFTs) and climate conditions. Aboutmore » 20 of the model parameters are identified as sensitive with the rest being relatively insensitive across all outputs and PFTs. These sensitivities are dependent on PFT, and are relatively consistent among sites within the same PFT. The five model outputs have a majority of their highly sensitive parameters in common. A common subset of sensitive parameters is also shared among PFTs, but some parameters are specific to certain types (e.g., deciduous phenology). In conclusion, the relative importance of these parameters shifts significantly among PFTs and with climatic variables such as mean annual temperature.« less
The Impact of Parametric Uncertainties on Biogeochemistry in the E3SM Land Model

DOE PAGES

Ricciuto, Daniel; Sargsyan, Khachik; Thornton, Peter

2018-02-27

We conduct a global sensitivity analysis (GSA) of the Energy Exascale Earth System Model (E3SM), land model (ELM) to calculate the sensitivity of five key carbon cycle outputs to 68 model parameters. This GSA is conducted by first constructing a Polynomial Chaos (PC) surrogate via new Weighted Iterative Bayesian Compressive Sensing (WIBCS) algorithm for adaptive basis growth leading to a sparse, high-dimensional PC surrogate with 3,000 model evaluations. The PC surrogate allows efficient extraction of GSA information leading to further dimensionality reduction. The GSA is performed at 96 FLUXNET sites covering multiple plant functional types (PFTs) and climate conditions. Aboutmore » 20 of the model parameters are identified as sensitive with the rest being relatively insensitive across all outputs and PFTs. These sensitivities are dependent on PFT, and are relatively consistent among sites within the same PFT. The five model outputs have a majority of their highly sensitive parameters in common. A common subset of sensitive parameters is also shared among PFTs, but some parameters are specific to certain types (e.g., deciduous phenology). In conclusion, the relative importance of these parameters shifts significantly among PFTs and with climatic variables such as mean annual temperature.« less
Controls/CFD Interdisciplinary Research Software Generates Low-Order Linear Models for Control Design From Steady-State CFD Results

NASA Technical Reports Server (NTRS)

Melcher, Kevin J.

1997-01-01

The NASA Lewis Research Center is developing analytical methods and software tools to create a bridge between the controls and computational fluid dynamics (CFD) disciplines. Traditionally, control design engineers have used coarse nonlinear simulations to generate information for the design of new propulsion system controls. However, such traditional methods are not adequate for modeling the propulsion systems of complex, high-speed vehicles like the High Speed Civil Transport. To properly model the relevant flow physics of high-speed propulsion systems, one must use simulations based on CFD methods. Such CFD simulations have become useful tools for engineers that are designing propulsion system components. The analysis techniques and software being developed as part of this effort are an attempt to evolve CFD into a useful tool for control design as well. One major aspect of this research is the generation of linear models from steady-state CFD results. CFD simulations, often used during the design of high-speed inlets, yield high resolution operating point data. Under a NASA grant, the University of Akron has developed analytical techniques and software tools that use these data to generate linear models for control design. The resulting linear models have the same number of states as the original CFD simulation, so they are still very large and computationally cumbersome. Model reduction techniques have been successfully applied to reduce these large linear models by several orders of magnitude without significantly changing the dynamic response. The result is an accurate, easy to use, low-order linear model that takes less time to generate than those generated by traditional means. The development of methods for generating low-order linear models from steady-state CFD is most complete at the one-dimensional level, where software is available to generate models with different kinds of input and output variables. One-dimensional methods have been extended somewhat so that linear models can also be generated from two- and three-dimensional steady-state results. Standard techniques are adequate for reducing the order of one-dimensional CFD-based linear models. However, reduction of linear models based on two- and three-dimensional CFD results is complicated by very sparse, ill-conditioned matrices. Some novel approaches are being investigated to solve this problem.
A sparse representation of gravitational waves from precessing compact binaries

NASA Astrophysics Data System (ADS)

Blackman, Jonathan; Szilagyi, Bela; Galley, Chad; Tiglio, Manuel

2014-03-01

With the advanced generation of gravitational wave detectors coming online in the near future, there is a need for accurate models of gravitational waveforms emitted by binary neutron stars and/or black holes. Post-Newtonian approximations work well for the early inspiral and there are models covering the late inspiral as well as merger and ringdown for the non-precessing case. While numerical relativity simulations have no difficulty with precession and can now provide accurate waveforms for a broad range of parameters, covering the 7 dimensional precessing parameter space with ~107 simulations is not feasible. There is still hope, as reduced order modelling techniques have been highly successful in reducing the impact of the curse of dimensionality for lower dimensional cases. We construct a reduced basis of Post-Newtonian waveforms for the full parameter space with mass ratios up to 10 and spins up to 0 . 9 , and find that for the last 100 orbits only ~ 50 waveforms are needed. The huge compression relies heavily on a reparametrization which seeks to reduce the non-linearity of the waveforms. We also show that the addition of merger and ringdown only mildly increases the size of the basis.
A three-dimensional model of solar radiation transfer in a non-uniform plant canopy

NASA Astrophysics Data System (ADS)

Levashova, N. T.; Mukhartova, Yu V.

2018-01-01

A three-dimensional (3D) model of solar radiation transfer in a non-uniform plant canopy was developed. It is based on radiative transfer equations and a so-called turbid medium assumption. The model takes into account the multiple scattering contributions of plant elements in radiation fluxes. These enable more accurate descriptions of plant canopy reflectance and transmission in different spectral bands. The model was applied to assess the effects of plant canopy heterogeneity on solar radiation transmission and to quantify the difference in a radiation transfer between photosynthetically active radiation PAR (=0.39-0.72 μm) and near infrared solar radiation NIR (Δλ = 0.72-3.00 μm). Comparisons of the radiative transfer fluxes simulated by the 3D model within a plant canopy consisted of sparsely planted fruit trees (plant area index, PAI - 0.96 m2 m-2) with radiation fluxes simulated by a one-dimensional (1D) approach, assumed horizontal homogeneity of plant and leaf area distributions, showed that, for sunny weather conditions with a high solar elevation angle, an application of a simplified 1D approach can result in an underestimation of transmitted solar radiation by about 22% for PAR, and by about 26% for NIR.
Model's sparse representation based on reduced mixed GMsFE basis methods

NASA Astrophysics Data System (ADS)

Jiang, Lijian; Li, Qiuqi

2017-06-01

In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.
Model's sparse representation based on reduced mixed GMsFE basis methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn

2017-06-01

In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a largemore » number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.« less
High-Dimensional Bayesian Geostatistics

PubMed Central

Banerjee, Sudipto

2017-01-01

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as “priors” for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has ~ n floating point operations (flops), where n the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings. PMID:29391920
High-Dimensional Bayesian Geostatistics.

PubMed

Banerjee, Sudipto

2017-06-01

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as "priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has ~ n floating point operations (flops), where n the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings.
Marginally specified priors for non-parametric Bayesian estimation

PubMed Central

Kessler, David C.; Hoff, Peter D.; Dunson, David B.

2014-01-01

Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813
Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.

PubMed

Safo, Sandra E; Li, Shuzhao; Long, Qi

2018-03-01

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach. © 2017, The International Biometric Society.
Beyond union of subspaces: Subspace pursuit on Grassmann manifold for data representation

DOE PAGES

Shen, Xinyue; Krim, Hamid; Gu, Yuantao

2016-03-01

Discovering the underlying structure of a high-dimensional signal or big data has always been a challenging topic, and has become harder to tackle especially when the observations are exposed to arbitrary sparse perturbations. Here in this paper, built on the model of a union of subspaces (UoS) with sparse outliers and inspired by a basis pursuit strategy, we exploit the fundamental structure of a Grassmann manifold, and propose a new technique of pursuing the subspaces systematically by solving a non-convex optimization problem using the alternating direction method of multipliers. This problem as noted is further complicated by non-convex constraints onmore » the Grassmann manifold, as well as the bilinearity in the penalty caused by the subspace bases and coefficients. Nevertheless, numerical experiments verify that the proposed algorithm, which provides elegant solutions to the sub-problems in each step, is able to de-couple the subspaces and pursue each of them under time-efficient parallel computation.« less
Nested Expression Domains for Odorant Receptors in Zebrafish Olfactory Epithelium

NASA Astrophysics Data System (ADS)

Weth, Franco; Nadler, Walter; Korsching, Sigrun

1996-11-01

The mapping of high-dimensional olfactory stimuli onto the two-dimensional surface of the nasal sensory epithelium constitutes the first step in the neuronal encoding of olfactory input. We have used zebrafish as a model system to analyze the spatial distribution of odorant receptor molecules in the olfactory epithelium by quantitative in situ hybridization. To this end, we have cloned 10 very divergent zebrafish odorant receptor molecules by PCR. Individual genes are expressed in sparse olfactory receptor neurons. Analysis of the position of labeled cells in a simplified coordinate system revealed three concentric, albeit overlapping, expression domains for the four odorant receptors analyzed in detail. Such regionalized expression should result in a corresponding segregation of functional response properties. This might represent the first step of spatial encoding of olfactory input or be essential for the development of the olfactory system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lei, Huan; Yang, Xiu; Zheng, Bin

Biomolecules exhibit conformational fluctuations near equilibrium states, inducing uncertainty in various biological properties in a dynamic way. We have developed a general method to quantify the uncertainty of target properties induced by conformational fluctuations. Using a generalized polynomial chaos (gPC) expansion, we construct a surrogate model of the target property with respect to varying conformational states. We also propose a method to increase the sparsity of the gPC expansion by defining a set of conformational “active space” random variables. With the increased sparsity, we employ the compressive sensing method to accurately construct the surrogate model. We demonstrate the performance ofmore » the surrogate model by evaluating fluctuation-induced uncertainty in solvent-accessible surface area for the bovine trypsin inhibitor protein system and show that the new approach offers more accurate statistical information than standard Monte Carlo approaches. Further more, the constructed surrogate model also enables us to directly evaluate the target property under various conformational states, yielding a more accurate response surface than standard sparse grid collocation methods. In particular, the new method provides higher accuracy in high-dimensional systems, such as biomolecules, where sparse grid performance is limited by the accuracy of the computed quantity of interest. Finally, our new framework is generalizable and can be used to investigate the uncertainty of a wide variety of target properties in biomolecular systems.« less

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lei, Huan; Yang, Xiu; Zheng, Bin

Biomolecules exhibit conformational fluctuations near equilibrium states, inducing uncertainty in various biological properties in a dynamic way. We have developed a general method to quantify the uncertainty of target properties induced by conformational fluctuations. Using a generalized polynomial chaos (gPC) expansion, we construct a surrogate model of the target property with respect to varying conformational states. We also propose a method to increase the sparsity of the gPC expansion by defining a set of conformational “active space” random variables. With the increased sparsity, we employ the compressive sensing method to accurately construct the surrogate model. We demonstrate the performance ofmore » the surrogate model by evaluating fluctuation-induced uncertainty in solvent-accessible surface area for the bovine trypsin inhibitor protein system and show that the new approach offers more accurate statistical information than standard Monte Carlo approaches. Further more, the constructed surrogate model also enables us to directly evaluate the target property under various conformational states, yielding a more accurate response surface than standard sparse grid collocation methods. In particular, the new method provides higher accuracy in high-dimensional systems, such as biomolecules, where sparse grid performance is limited by the accuracy of the computed quantity of interest. Our new framework is generalizable and can be used to investigate the uncertainty of a wide variety of target properties in biomolecular systems.« less
Semi-Supervised Clustering for High-Dimensional and Sparse Features

ERIC Educational Resources Information Center

Yan, Su

2010-01-01

Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…
Color normalization of histology slides using graph regularized sparse NMF

NASA Astrophysics Data System (ADS)

Sha, Lingdao; Schonfeld, Dan; Sethi, Amit

2017-03-01

Computer based automatic medical image processing and quantification are becoming popular in digital pathology. However, preparation of histology slides can vary widely due to differences in staining equipment, procedures and reagents, which can reduce the accuracy of algorithms that analyze their color and texture information. To re- duce the unwanted color variations, various supervised and unsupervised color normalization methods have been proposed. Compared with supervised color normalization methods, unsupervised color normalization methods have advantages of time and cost efficient and universal applicability. Most of the unsupervised color normaliza- tion methods for histology are based on stain separation. Based on the fact that stain concentration cannot be negative and different parts of the tissue absorb different stains, nonnegative matrix factorization (NMF), and particular its sparse version (SNMF), are good candidates for stain separation. However, most of the existing unsupervised color normalization method like PCA, ICA, NMF and SNMF fail to consider important information about sparse manifolds that its pixels occupy, which could potentially result in loss of texture information during color normalization. Manifold learning methods like Graph Laplacian have proven to be very effective in interpreting high-dimensional data. In this paper, we propose a novel unsupervised stain separation method called graph regularized sparse nonnegative matrix factorization (GSNMF). By considering the sparse prior of stain concentration together with manifold information from high-dimensional image data, our method shows better performance in stain color deconvolution than existing unsupervised color deconvolution methods, especially in keeping connected texture information. To utilized the texture information, we construct a nearest neighbor graph between pixels within a spatial area of an image based on their distances using heat kernal in lαβ space. The representation of a pixel in the stain density space is constrained to follow the feature distance of the pixel to pixels in the neighborhood graph. Utilizing color matrix transfer method with the stain concentrations found using our GSNMF method, the color normalization performance was also better than existing methods.
Dimensionality reduction of collective motion by principal manifolds

NASA Astrophysics Data System (ADS)

Gajamannage, Kelum; Butail, Sachit; Porfiri, Maurizio; Bollt, Erik M.

2015-01-01

While the existence of low-dimensional embedding manifolds has been shown in patterns of collective motion, the current battery of nonlinear dimensionality reduction methods is not amenable to the analysis of such manifolds. This is mainly due to the necessary spectral decomposition step, which limits control over the mapping from the original high-dimensional space to the embedding space. Here, we propose an alternative approach that demands a two-dimensional embedding which topologically summarizes the high-dimensional data. In this sense, our approach is closely related to the construction of one-dimensional principal curves that minimize orthogonal error to data points subject to smoothness constraints. Specifically, we construct a two-dimensional principal manifold directly in the high-dimensional space using cubic smoothing splines, and define the embedding coordinates in terms of geodesic distances. Thus, the mapping from the high-dimensional data to the manifold is defined in terms of local coordinates. Through representative examples, we show that compared to existing nonlinear dimensionality reduction methods, the principal manifold retains the original structure even in noisy and sparse datasets. The principal manifold finding algorithm is applied to configurations obtained from a dynamical system of multiple agents simulating a complex maneuver called predator mobbing, and the resulting two-dimensional embedding is compared with that of a well-established nonlinear dimensionality reduction method.
Efficient Optimization of Stimuli for Model-Based Design of Experiments to Resolve Dynamical Uncertainty.

PubMed

Mdluli, Thembi; Buzzard, Gregery T; Rundell, Ann E

2015-09-01

This model-based design of experiments (MBDOE) method determines the input magnitudes of an experimental stimuli to apply and the associated measurements that should be taken to optimally constrain the uncertain dynamics of a biological system under study. The ideal global solution for this experiment design problem is generally computationally intractable because of parametric uncertainties in the mathematical model of the biological system. Others have addressed this issue by limiting the solution to a local estimate of the model parameters. Here we present an approach that is independent of the local parameter constraint. This approach is made computationally efficient and tractable by the use of: (1) sparse grid interpolation that approximates the biological system dynamics, (2) representative parameters that uniformly represent the data-consistent dynamical space, and (3) probability weights of the represented experimentally distinguishable dynamics. Our approach identifies data-consistent representative parameters using sparse grid interpolants, constructs the optimal input sequence from a greedy search, and defines the associated optimal measurements using a scenario tree. We explore the optimality of this MBDOE algorithm using a 3-dimensional Hes1 model and a 19-dimensional T-cell receptor model. The 19-dimensional T-cell model also demonstrates the MBDOE algorithm's scalability to higher dimensions. In both cases, the dynamical uncertainty region that bounds the trajectories of the target system states were reduced by as much as 86% and 99% respectively after completing the designed experiments in silico. Our results suggest that for resolving dynamical uncertainty, the ability to design an input sequence paired with its associated measurements is particularly important when limited by the number of measurements.
Efficient Optimization of Stimuli for Model-Based Design of Experiments to Resolve Dynamical Uncertainty

PubMed Central

Mdluli, Thembi; Buzzard, Gregery T.; Rundell, Ann E.

2015-01-01

This model-based design of experiments (MBDOE) method determines the input magnitudes of an experimental stimuli to apply and the associated measurements that should be taken to optimally constrain the uncertain dynamics of a biological system under study. The ideal global solution for this experiment design problem is generally computationally intractable because of parametric uncertainties in the mathematical model of the biological system. Others have addressed this issue by limiting the solution to a local estimate of the model parameters. Here we present an approach that is independent of the local parameter constraint. This approach is made computationally efficient and tractable by the use of: (1) sparse grid interpolation that approximates the biological system dynamics, (2) representative parameters that uniformly represent the data-consistent dynamical space, and (3) probability weights of the represented experimentally distinguishable dynamics. Our approach identifies data-consistent representative parameters using sparse grid interpolants, constructs the optimal input sequence from a greedy search, and defines the associated optimal measurements using a scenario tree. We explore the optimality of this MBDOE algorithm using a 3-dimensional Hes1 model and a 19-dimensional T-cell receptor model. The 19-dimensional T-cell model also demonstrates the MBDOE algorithm’s scalability to higher dimensions. In both cases, the dynamical uncertainty region that bounds the trajectories of the target system states were reduced by as much as 86% and 99% respectively after completing the designed experiments in silico. Our results suggest that for resolving dynamical uncertainty, the ability to design an input sequence paired with its associated measurements is particularly important when limited by the number of measurements. PMID:26379275
Fracture size and transmissivity correlations: Implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size

DOE PAGES

Hyman, Jeffrey De'Haven; Aldrich, Garrett Allen; Viswanathan, Hari S.; ...

2016-08-01

We characterize how different fracture size-transmissivity relationships influence flow and transport simulations through sparse three-dimensional discrete fracture networks. Although it is generally accepted that there is a positive correlation between a fracture's size and its transmissivity/aperture, the functional form of that relationship remains a matter of debate. Relationships that assume perfect correlation, semicorrelation, and noncorrelation between the two have been proposed. To study the impact that adopting one of these relationships has on transport properties, we generate multiple sparse fracture networks composed of circular fractures whose radii follow a truncated power law distribution. The distribution of transmissivities are selected somore » that the mean transmissivity of the fracture networks are the same and the distributions of aperture and transmissivity in models that include a stochastic term are also the same. We observe that adopting a correlation between a fracture size and its transmissivity leads to earlier breakthrough times and higher effective permeability when compared to networks where no correlation is used. While fracture network geometry plays the principal role in determining where transport occurs within the network, the relationship between size and transmissivity controls the flow speed. Lastly, these observations indicate DFN modelers should be aware that breakthrough times and effective permeabilities can be strongly influenced by such a relationship in addition to fracture and network statistics.« less
Fracture size and transmissivity correlations: Implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hyman, Jeffrey De'Haven; Aldrich, Garrett Allen; Viswanathan, Hari S.

We characterize how different fracture size-transmissivity relationships influence flow and transport simulations through sparse three-dimensional discrete fracture networks. Although it is generally accepted that there is a positive correlation between a fracture's size and its transmissivity/aperture, the functional form of that relationship remains a matter of debate. Relationships that assume perfect correlation, semicorrelation, and noncorrelation between the two have been proposed. To study the impact that adopting one of these relationships has on transport properties, we generate multiple sparse fracture networks composed of circular fractures whose radii follow a truncated power law distribution. The distribution of transmissivities are selected somore » that the mean transmissivity of the fracture networks are the same and the distributions of aperture and transmissivity in models that include a stochastic term are also the same. We observe that adopting a correlation between a fracture size and its transmissivity leads to earlier breakthrough times and higher effective permeability when compared to networks where no correlation is used. While fracture network geometry plays the principal role in determining where transport occurs within the network, the relationship between size and transmissivity controls the flow speed. Lastly, these observations indicate DFN modelers should be aware that breakthrough times and effective permeabilities can be strongly influenced by such a relationship in addition to fracture and network statistics.« less
Iteration and superposition encryption scheme for image sequences based on multi-dimensional keys

NASA Astrophysics Data System (ADS)

Han, Chao; Shen, Yuzhen; Ma, Wenlin

2017-12-01

An iteration and superposition encryption scheme for image sequences based on multi-dimensional keys is proposed for high security, big capacity and low noise information transmission. Multiple images to be encrypted are transformed into phase-only images with the iterative algorithm and then are encrypted by different random phase, respectively. The encrypted phase-only images are performed by inverse Fourier transform, respectively, thus new object functions are generated. The new functions are located in different blocks and padded zero for a sparse distribution, then they propagate to a specific region at different distances by angular spectrum diffraction, respectively and are superposed in order to form a single image. The single image is multiplied with a random phase in the frequency domain and then the phase part of the frequency spectrums is truncated and the amplitude information is reserved. The random phase, propagation distances, truncated phase information in frequency domain are employed as multiple dimensional keys. The iteration processing and sparse distribution greatly reduce the crosstalk among the multiple encryption images. The superposition of image sequences greatly improves the capacity of encrypted information. Several numerical experiments based on a designed optical system demonstrate that the proposed scheme can enhance encrypted information capacity and make image transmission at a highly desired security level.
Multiple Kernel Sparse Representation based Orthogonal Discriminative Projection and Its Cost-Sensitive Extension.

PubMed

Zhang, Guoqing; Sun, Huaijiang; Xia, Guiyu; Sun, Quansen

2016-07-07

Sparse representation based classification (SRC) has been developed and shown great potential for real-world application. Based on SRC, Yang et al. [10] devised a SRC steered discriminative projection (SRC-DP) method. However, as a linear algorithm, SRC-DP cannot handle the data with highly nonlinear distribution. Kernel sparse representation-based classifier (KSRC) is a non-linear extension of SRC and can remedy the drawback of SRC. KSRC requires the use of a predetermined kernel function and selection of the kernel function and its parameters is difficult. Recently, multiple kernel learning for SRC (MKL-SRC) [22] has been proposed to learn a kernel from a set of base kernels. However, MKL-SRC only considers the within-class reconstruction residual while ignoring the between-class relationship, when learning the kernel weights. In this paper, we propose a novel multiple kernel sparse representation-based classifier (MKSRC), and then we use it as a criterion to design a multiple kernel sparse representation based orthogonal discriminative projection method (MK-SR-ODP). The proposed algorithm aims at learning a projection matrix and a corresponding kernel from the given base kernels such that in the low dimension subspace the between-class reconstruction residual is maximized and the within-class reconstruction residual is minimized. Furthermore, to achieve a minimum overall loss by performing recognition in the learned low-dimensional subspace, we introduce cost information into the dimensionality reduction method. The solutions for the proposed method can be efficiently found based on trace ratio optimization method [33]. Extensive experimental results demonstrate the superiority of the proposed algorithm when compared with the state-of-the-art methods.
Stereo Science Results at Solar Minimum

NASA Technical Reports Server (NTRS)

Christian, Eric R.; Kaiser, Michael L.; Kucera Therese A.; St. Cyr, O. C.; van Driel-Gesztelyi, Lidia; Mandrini, Cristina H.

2009-01-01

The magnetic fields that drive solar activity are complex and inherently three-dimensional structures. Twisted flux ropes, magnetic reconnection and the initiation of solar storms, as well as space weather propagation through the heliosphere, are just a few of the topics that cannot properly be observed or modeled in only two dimensions. Examination of this three-dimensional complex has been hampered by the fact that solar remote sensing observations have occurred only from the Earth-Sun line, and in situ observations, while available from a greater variety of locations, have been sparse throughout the heliosphere.
Machine Learning Techniques for Global Sensitivity Analysis in Climate Models

NASA Astrophysics Data System (ADS)

Safta, C.; Sargsyan, K.; Ricciuto, D. M.

2017-12-01

Climate models studies are not only challenged by the compute intensive nature of these models but also by the high-dimensionality of the input parameter space. In our previous work with the land model components (Sargsyan et al., 2014) we identified subsets of 10 to 20 parameters relevant for each QoI via Bayesian compressive sensing and variance-based decomposition. Nevertheless the algorithms were challenged by the nonlinear input-output dependencies for some of the relevant QoIs. In this work we will explore a combination of techniques to extract relevant parameters for each QoI and subsequently construct surrogate models with quantified uncertainty necessary to future developments, e.g. model calibration and prediction studies. In the first step, we will compare the skill of machine-learning models (e.g. neural networks, support vector machine) to identify the optimal number of classes in selected QoIs and construct robust multi-class classifiers that will partition the parameter space in regions with smooth input-output dependencies. These classifiers will be coupled with techniques aimed at building sparse and/or low-rank surrogate models tailored to each class. Specifically we will explore and compare sparse learning techniques with low-rank tensor decompositions. These models will be used to identify parameters that are important for each QoI. Surrogate accuracy requirements are higher for subsequent model calibration studies and we will ascertain the performance of this workflow for multi-site ALM simulation ensembles.
Constructing Surrogate Models of Complex Systems with Enhanced Sparsity: Quantifying the Influence of Conformational Uncertainty in Biomolecular Solvation

DOE PAGES

Lei, Huan; Yang, Xiu; Zheng, Bin; ...

2015-11-05

Biomolecules exhibit conformational fluctuations near equilibrium states, inducing uncertainty in various biological properties in a dynamic way. We have developed a general method to quantify the uncertainty of target properties induced by conformational fluctuations. Using a generalized polynomial chaos (gPC) expansion, we construct a surrogate model of the target property with respect to varying conformational states. We also propose a method to increase the sparsity of the gPC expansion by defining a set of conformational “active space” random variables. With the increased sparsity, we employ the compressive sensing method to accurately construct the surrogate model. We demonstrate the performance ofmore » the surrogate model by evaluating fluctuation-induced uncertainty in solvent-accessible surface area for the bovine trypsin inhibitor protein system and show that the new approach offers more accurate statistical information than standard Monte Carlo approaches. Further more, the constructed surrogate model also enables us to directly evaluate the target property under various conformational states, yielding a more accurate response surface than standard sparse grid collocation methods. In particular, the new method provides higher accuracy in high-dimensional systems, such as biomolecules, where sparse grid performance is limited by the accuracy of the computed quantity of interest. Finally, our new framework is generalizable and can be used to investigate the uncertainty of a wide variety of target properties in biomolecular systems.« less
TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS

PubMed Central

Johndrow, James E.; Bhattacharya, Anirban; Dunson, David B.

2017-01-01

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions. PMID:29332971
DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.

The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
Multiclass classification of microarray data samples with a reduced number of genes

PubMed Central

2011-01-01

Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522
Spatio-temporal Event Classification using Time-series Kernel based Structured Sparsity

PubMed Central

Jeni, László A.; Lőrincz, András; Szabó, Zoltán; Cohn, Jeffrey F.; Kanade, Takeo

2016-01-01

In many behavioral domains, such as facial expression and gesture, sparse structure is prevalent. This sparsity would be well suited for event detection but for one problem. Features typically are confounded by alignment error in space and time. As a consequence, high-dimensional representations such as SIFT and Gabor features have been favored despite their much greater computational cost and potential loss of information. We propose a Kernel Structured Sparsity (KSS) method that can handle both the temporal alignment problem and the structured sparse reconstruction within a common framework, and it can rely on simple features. We characterize spatio-temporal events as time-series of motion patterns and by utilizing time-series kernels we apply standard structured-sparse coding techniques to tackle this important problem. We evaluated the KSS method using both gesture and facial expression datasets that include spontaneous behavior and differ in degree of difficulty and type of ground truth coding. KSS outperformed both sparse and non-sparse methods that utilize complex image features and their temporal extensions. In the case of early facial event classification KSS had 10% higher accuracy as measured by F1 score over kernel SVM methods1. PMID:27830214
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension

PubMed Central

Wang, Lan; Wu, Yichao

2012-01-01

Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036
Spatial radiation environment in a heterogeneous oak woodland using a three-dimensional radiative transfer model and multiple constraints from observations

NASA Astrophysics Data System (ADS)

Kobayashi, H.; Ryu, Y.; Ustin, S.; Baldocchi, D. D.

2009-12-01

B15: Remote Characterization of Vegetation Structure: Including Research to Inform the Planned NASA DESDynI and ESA BIOMASS Missions Title: Spatial radiation environment in a heterogeneous oak woodland using a three-dimensional radiative transfer model and multiple constraints from observations Hideki Kobayashi, Youngryel Ryu, Susan Ustin, and Dennis Baldocchi Abstract Accurate evaluations of radiation environments of visible, near infrared, and thermal infrared wavebands in forest canopies are important to estimate energy, water, and carbon fluxes. Californian oak woodlands are sparse and highly clumped so that radiation environments are extremely heterogeneous spatially. The heterogeneity of radiation environments also varies with wavebands which depend on scattering and emission properties. So far, most of modeling studies have been performed in one dimensional radiative transfer models with (or without) clumping effect in the forest canopies. While some studies have been performed by using three dimensional radiative transfer models, several issues are still unresolved. For example, some 3D models calculate the radiation field with individual tree basis, and radiation interactions among trees are not considered. This interaction could be important in the highly scattering waveband such as near infrared. The objective of this study is to quantify the radiation field in the oak woodland. We developed a three dimensional radiative transfer model, which includes the thermal waveband. Soil/canopy energy balances and canopy physiology models, CANOAK, are incorporated in the radiative transfer model to simulate the diurnal patterns of thermal radiation fields and canopy physiology. Airborne LiDAR and canopy gap data measured by the several methods (digital photographs and plant canopy analyzer) were used to constrain the forest structures such as tree positions, crown sizes and leaf area density. Modeling results were tested by a traversing radiometer system that measured incoming photosynthetically active radiation and net radiation at forest floor and spatial variations in canopy reflectances taken by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). In this study, we show how the model with available measurements can reproduce the spatially heterogeneous radiation environments in the oak woodland.
SAMBA: Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahlfeld, R., E-mail: r.ahlfeld14@imperial.ac.uk; Belkouchi, B.; Montomoli, F.

2016-09-01

A new arbitrary Polynomial Chaos (aPC) method is presented for moderately high-dimensional problems characterised by limited input data availability. The proposed methodology improves the algorithm of aPC and extends the method, that was previously only introduced as tensor product expansion, to moderately high-dimensional stochastic problems. The fundamental idea of aPC is to use the statistical moments of the input random variables to develop the polynomial chaos expansion. This approach provides the possibility to propagate continuous or discrete probability density functions and also histograms (data sets) as long as their moments exist, are finite and the determinant of the moment matrixmore » is strictly positive. For cases with limited data availability, this approach avoids bias and fitting errors caused by wrong assumptions. In this work, an alternative way to calculate the aPC is suggested, which provides the optimal polynomials, Gaussian quadrature collocation points and weights from the moments using only a handful of matrix operations on the Hankel matrix of moments. It can therefore be implemented without requiring prior knowledge about statistical data analysis or a detailed understanding of the mathematics of polynomial chaos expansions. The extension to more input variables suggested in this work, is an anisotropic and adaptive version of Smolyak's algorithm that is solely based on the moments of the input probability distributions. It is referred to as SAMBA (PC), which is short for Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos. It is illustrated that for moderately high-dimensional problems (up to 20 different input variables or histograms) SAMBA can significantly simplify the calculation of sparse Gaussian quadrature rules. SAMBA's efficiency for multivariate functions with regard to data availability is further demonstrated by analysing higher order convergence and accuracy for a set of nonlinear test functions with 2, 5 and 10 different input distributions or histograms.« less

SAMBA: Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos

NASA Astrophysics Data System (ADS)

Ahlfeld, R.; Belkouchi, B.; Montomoli, F.

2016-09-01

A new arbitrary Polynomial Chaos (aPC) method is presented for moderately high-dimensional problems characterised by limited input data availability. The proposed methodology improves the algorithm of aPC and extends the method, that was previously only introduced as tensor product expansion, to moderately high-dimensional stochastic problems. The fundamental idea of aPC is to use the statistical moments of the input random variables to develop the polynomial chaos expansion. This approach provides the possibility to propagate continuous or discrete probability density functions and also histograms (data sets) as long as their moments exist, are finite and the determinant of the moment matrix is strictly positive. For cases with limited data availability, this approach avoids bias and fitting errors caused by wrong assumptions. In this work, an alternative way to calculate the aPC is suggested, which provides the optimal polynomials, Gaussian quadrature collocation points and weights from the moments using only a handful of matrix operations on the Hankel matrix of moments. It can therefore be implemented without requiring prior knowledge about statistical data analysis or a detailed understanding of the mathematics of polynomial chaos expansions. The extension to more input variables suggested in this work, is an anisotropic and adaptive version of Smolyak's algorithm that is solely based on the moments of the input probability distributions. It is referred to as SAMBA (PC), which is short for Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos. It is illustrated that for moderately high-dimensional problems (up to 20 different input variables or histograms) SAMBA can significantly simplify the calculation of sparse Gaussian quadrature rules. SAMBA's efficiency for multivariate functions with regard to data availability is further demonstrated by analysing higher order convergence and accuracy for a set of nonlinear test functions with 2, 5 and 10 different input distributions or histograms.
Enhanced spectral resolution by high-dimensional NMR using the filter diagonalization method and "hidden" dimensions.

PubMed

Meng, Xi; Nguyen, Bao D; Ridge, Clark; Shaka, A J

2009-01-01

High-dimensional (HD) NMR spectra have poorer digital resolution than low-dimensional (LD) spectra, for a fixed amount of experiment time. This has led to "reduced-dimensionality" strategies, in which several LD projections of the HD NMR spectrum are acquired, each with higher digital resolution; an approximate HD spectrum is then inferred by some means. We propose a strategy that moves in the opposite direction, by adding more time dimensions to increase the information content of the data set, even if only a very sparse time grid is used in each dimension. The full HD time-domain data can be analyzed by the filter diagonalization method (FDM), yielding very narrow resonances along all of the frequency axes, even those with sparse sampling. Integrating over the added dimensions of HD FDM NMR spectra reconstitutes LD spectra with enhanced resolution, often more quickly than direct acquisition of the LD spectrum with a larger number of grid points in each of the fewer dimensions. If the extra-dimensions do not appear in the final spectrum, and are used solely to boost information content, we propose the moniker hidden-dimension NMR. This work shows that HD peaks have unmistakable frequency signatures that can be detected as single HD objects by an appropriate algorithm, even though their patterns would be tricky for a human operator to visualize or recognize, and even if digital resolution in an HD FT spectrum is very coarse compared with natural line widths.
Methods of Sparse Modeling and Dimensionality Reduction to Deal with Big Data

DTIC Science & Technology

2015-04-01

supervised learning (c). Our framework consists of two separate phases: (a) first find an initial space in an unsupervised manner; then (b) utilize label...model that can learn thousands of topics from a large set of documents and infer the topic mixture of each document, 2) a supervised dimension reduction...model that can learn thousands of topics from a large set of documents and infer the topic mixture of each document, (i) a method of supervised
The Research on Denoising of SAR Image Based on Improved K-SVD Algorithm

NASA Astrophysics Data System (ADS)

Tan, Linglong; Li, Changkai; Wang, Yueqin

2018-04-01

SAR images often receive noise interference in the process of acquisition and transmission, which can greatly reduce the quality of images and cause great difficulties for image processing. The existing complete DCT dictionary algorithm is fast in processing speed, but its denoising effect is poor. In this paper, the problem of poor denoising, proposed K-SVD (K-means and singular value decomposition) algorithm is applied to the image noise suppression. Firstly, the sparse dictionary structure is introduced in detail. The dictionary has a compact representation and can effectively train the image signal. Then, the sparse dictionary is trained by K-SVD algorithm according to the sparse representation of the dictionary. The algorithm has more advantages in high dimensional data processing. Experimental results show that the proposed algorithm can remove the speckle noise more effectively than the complete DCT dictionary and retain the edge details better.
A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging

PubMed Central

Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu

2017-01-01

This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583
Direct coupling of tomography and ptychography

DOE PAGES

Gürsoy, Doğa

2017-08-09

We present a generalization of the ptychographic phase problem for recovering refractive properties of a three-dimensional object in a tomography setting. Our approach, which ignores the lateral overlapping probe requirements in existing ptychography algorithms, can enable the reconstruction of objects using highly flexible acquisition patterns and pave the way for sparse and rapid data collection with lower radiation exposure.
QUADRO: A SUPERVISED DIMENSION REDUCTION METHOD VIA RAYLEIGH QUOTIENT OPTIMIZATION.

PubMed

Fan, Jianqing; Ke, Zheng Tracy; Liu, Han; Xia, Lucy

We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method-named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization)-for analyzing high-dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests. One major challenge of Rayleigh quotient optimization is that the variance of quadratic statistics involves all fourth cross-moments of predictors, which are infeasible to compute for high-dimensional applications and may accumulate too many stochastic errors. This issue is resolved by considering a family of elliptical models. Moreover, for heavy-tail distributions, robust estimates of mean vectors and covariance matrices are employed to guarantee uniform convergence in estimating non-polynomially many parameters, even though only the fourth moments are assumed. Methodologically, QUADRO is based on elliptical models which allow us to formulate the Rayleigh quotient maximization as a convex optimization problem. Computationally, we propose an efficient linearized augmented Lagrangian method to solve the constrained optimization problem. Theoretically, we provide explicit rates of convergence in terms of Rayleigh quotient under both Gaussian and general elliptical models. Thorough numerical results on both synthetic and real datasets are also provided to back up our theoretical results.
Estimating the Uncertainty and Predictive Capabilities of Three-Dimensional Earth Models (Postprint)

DTIC Science & Technology

2012-03-22

www.isc.ac.uk). This global database includes more than 7,000 events whose epicentral location accuracy is known to at least 5 km. GT events with...region, which illustrates the difficulty of validating a model with travel times alone. However, the IASPEI REL database is currently the highest...S (right) paths in the IASPEI REL ground-truth database . Stations are represented by purple triangles and events by gray circles. Note the sparse
Data-driven cluster reinforcement and visualization in sparsely-matched self-organizing maps.

PubMed

Manukyan, Narine; Eppstein, Margaret J; Rizzo, Donna M

2012-05-01

A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. However, when there are more neurons than input patterns, it can be challenging to interpret the results, due to diffuse cluster boundaries and limitations of current methods for displaying interneuron distances. In this brief, we introduce a new cluster reinforcement (CR) phase for sparsely-matched SOMs. The CR phase amplifies within-cluster similarity in an unsupervised, data-driven manner. Discontinuities in the resulting map correspond to between-cluster distances and are stored in a boundary (B) matrix. We describe a new hierarchical visualization of cluster boundaries displayed directly on feature maps, which requires no further clustering beyond what was implicitly accomplished during self-organization in SOM training. We use a synthetic benchmark problem and previously published microbial community profile data to demonstrate the benefits of the proposed methods.
A non-linear dimension reduction methodology for generating data-driven stochastic input models

NASA Astrophysics Data System (ADS)

Ganapathysubramanian, Baskar; Zabaras, Nicholas

2008-06-01

Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space Rn. An isometric mapping F from M to a low-dimensional, compact, connected set A⊂Rd(d≪n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology by constructing low-dimensional input stochastic models to represent thermal diffusivity in two-phase microstructures. This model is used in analyzing the effect of topological variations of two-phase microstructures on the evolution of temperature in heat conduction processes.
Modeling of the gain distribution for diode pumping of a solid-state laser rod with nonimaging optics.

PubMed

Koshel, R J; Walmsley, I A

1993-03-20

We investigate the absorption distribution in a cylindrical gain medium that is pumped by a source of distributed laser diodes by means of a pump cavity developed from the edge-ray principle of nonimaging optics. The performance of this pumping arrangement is studied by using a nonsequential, numerical, three-dimensional ray-tracing scheme. A figure of merit is defined for the pump cavities that takes into account the coupling efficiency and uniformity of the absorption distribution. It is found that the nonimaging pump cavity maintains a high coupling efficiency with extended two-dimensional diode arrays and obtains a fairly uniform absorption distribution. The nonimaging cavity is compared with two other designs: a close-coupled side-pumped cavity and an imaging design in the form of a elliptical cavity. The nonimaging cavity has a better figure of merit per diode than these two designs. It also permits the use of an extended, sparse, two-dimensional diode array, which reduces thermal loading of the source and eliminates all cavity optics other than the main reflector.
Implicit kernel sparse shape representation: a sparse-neighbors-based objection segmentation framework.

PubMed

Yao, Jincao; Yu, Huimin; Hu, Roland

2017-01-01

This paper introduces a new implicit-kernel-sparse-shape-representation-based object segmentation framework. Given an input object whose shape is similar to some of the elements in the training set, the proposed model can automatically find a cluster of implicit kernel sparse neighbors to approximately represent the input shape and guide the segmentation. A distance-constrained probabilistic definition together with a dualization energy term is developed to connect high-level shape representation and low-level image information. We theoretically prove that our model not only derives from two projected convex sets but is also equivalent to a sparse-reconstruction-error-based representation in the Hilbert space. Finally, a "wake-sleep"-based segmentation framework is applied to drive the evolutionary curve to recover the original shape of the object. We test our model on two public datasets. Numerical experiments on both synthetic images and real applications show the superior capabilities of the proposed framework.
ESTIMATION OF FUNCTIONALS OF SPARSE COVARIANCE MATRICES.

PubMed

Fan, Jianqing; Rigollet, Philippe; Wang, Weichen

High-dimensional statistical tests often ignore correlations to gain simplicity and stability leading to null distributions that depend on functionals of correlation matrices such as their Frobenius norm and other ℓ r norms. Motivated by the computation of critical values of such tests, we investigate the difficulty of estimation the functionals of sparse correlation matrices. Specifically, we show that simple plug-in procedures based on thresholded estimators of correlation matrices are sparsity-adaptive and minimax optimal over a large class of correlation matrices. Akin to previous results on functional estimation, the minimax rates exhibit an elbow phenomenon. Our results are further illustrated in simulated data as well as an empirical study of data arising in financial econometrics.
ESTIMATION OF FUNCTIONALS OF SPARSE COVARIANCE MATRICES

PubMed Central

Fan, Jianqing; Rigollet, Philippe; Wang, Weichen

2016-01-01

High-dimensional statistical tests often ignore correlations to gain simplicity and stability leading to null distributions that depend on functionals of correlation matrices such as their Frobenius norm and other ℓr norms. Motivated by the computation of critical values of such tests, we investigate the difficulty of estimation the functionals of sparse correlation matrices. Specifically, we show that simple plug-in procedures based on thresholded estimators of correlation matrices are sparsity-adaptive and minimax optimal over a large class of correlation matrices. Akin to previous results on functional estimation, the minimax rates exhibit an elbow phenomenon. Our results are further illustrated in simulated data as well as an empirical study of data arising in financial econometrics. PMID:26806986
SD-SEM: sparse-dense correspondence for 3D reconstruction of microscopic samples.

PubMed

Baghaie, Ahmadreza; Tafti, Ahmad P; Owen, Heather A; D'Souza, Roshan M; Yu, Zeyun

2017-06-01

Scanning electron microscopy (SEM) imaging has been a principal component of many studies in biomedical, mechanical, and materials sciences since its emergence. Despite the high resolution of captured images, they remain two-dimensional (2D). In this work, a novel framework using sparse-dense correspondence is introduced and investigated for 3D reconstruction of stereo SEM images. SEM micrographs from microscopic samples are captured by tilting the specimen stage by a known angle. The pair of SEM micrographs is then rectified using sparse scale invariant feature transform (SIFT) features/descriptors and a contrario RANSAC for matching outlier removal to ensure a gross horizontal displacement between corresponding points. This is followed by dense correspondence estimation using dense SIFT descriptors and employing a factor graph representation of the energy minimization functional and loopy belief propagation (LBP) as means of optimization. Given the pixel-by-pixel correspondence and the tilt angle of the specimen stage during the acquisition of micrographs, depth can be recovered. Extensive tests reveal the strength of the proposed method for high-quality reconstruction of microscopic samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
Three-dimensional image authentication scheme using sparse phase information in double random phase encoded integral imaging.

PubMed

Yi, Faliu; Jeoung, Yousun; Moon, Inkyu

2017-05-20

In recent years, many studies have focused on authentication of two-dimensional (2D) images using double random phase encryption techniques. However, there has been little research on three-dimensional (3D) imaging systems, such as integral imaging, for 3D image authentication. We propose a 3D image authentication scheme based on a double random phase integral imaging method. All of the 2D elemental images captured through integral imaging are encrypted with a double random phase encoding algorithm and only partial phase information is reserved. All the amplitude and other miscellaneous phase information in the encrypted elemental images is discarded. Nevertheless, we demonstrate that 3D images from integral imaging can be authenticated at different depths using a nonlinear correlation method. The proposed 3D image authentication algorithm can provide enhanced information security because the decrypted 2D elemental images from the sparse phase cannot be easily observed by the naked eye. Additionally, using sparse phase images without any amplitude information can greatly reduce data storage costs and aid in image compression and data transmission.
Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data.

PubMed

Liu, Zhenqiu; Sun, Fengzhu; McGovern, Dermot P

2017-01-01

Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms ( L 0 ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. The developed Software L 0 ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.
Predictive sparse modeling of fMRI data for improved classification, regression, and visualization using the k-support norm.

PubMed

Belilovsky, Eugene; Gkirtzou, Katerina; Misyrlis, Michail; Konova, Anna B; Honorio, Jean; Alia-Klein, Nelly; Goldstein, Rita Z; Samaras, Dimitris; Blaschko, Matthew B

2015-12-01

We explore various sparse regularization techniques for analyzing fMRI data, such as the ℓ1 norm (often called LASSO in the context of a squared loss function), elastic net, and the recently introduced k-support norm. Employing sparsity regularization allows us to handle the curse of dimensionality, a problem commonly found in fMRI analysis. In this work we consider sparse regularization in both the regression and classification settings. We perform experiments on fMRI scans from cocaine-addicted as well as healthy control subjects. We show that in many cases, use of the k-support norm leads to better predictive performance, solution stability, and interpretability as compared to other standard approaches. We additionally analyze the advantages of using the absolute loss function versus the standard squared loss which leads to significantly better predictive performance for the regularization methods tested in almost all cases. Our results support the use of the k-support norm for fMRI analysis and on the clinical side, the generalizability of the I-RISA model of cocaine addiction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Novel Spectral Representations and Sparsity-Driven Algorithms for Shape Modeling and Analysis

NASA Astrophysics Data System (ADS)

Zhong, Ming

In this dissertation, we focus on extending classical spectral shape analysis by incorporating spectral graph wavelets and sparsity-seeking algorithms. Defined with the graph Laplacian eigenbasis, the spectral graph wavelets are localized both in the vertex domain and graph spectral domain, and thus are very effective in describing local geometry. With a rich dictionary of elementary vectors and forcing certain sparsity constraints, a real life signal can often be well approximated by a very sparse coefficient representation. The many successful applications of sparse signal representation in computer vision and image processing inspire us to explore the idea of employing sparse modeling techniques with dictionary of spectral basis to solve various shape modeling problems. Conventional spectral mesh compression uses the eigenfunctions of mesh Laplacian as shape bases, which are highly inefficient in representing local geometry. To ameliorate, we advocate an innovative approach to 3D mesh compression using spectral graph wavelets as dictionary to encode mesh geometry. The spectral graph wavelets are locally defined at individual vertices and can better capture local shape information than Laplacian eigenbasis. The multi-scale SGWs form a redundant dictionary as shape basis, so we formulate the compression of 3D shape as a sparse approximation problem that can be readily handled by greedy pursuit algorithms. Surface inpainting refers to the completion or recovery of missing shape geometry based on the shape information that is currently available. We devise a new surface inpainting algorithm founded upon the theory and techniques of sparse signal recovery. Instead of estimating the missing geometry directly, our novel method is to find this low-dimensional representation which describes the entire original shape. More specifically, we find that, for many shapes, the vertex coordinate function can be well approximated by a very sparse coefficient representation with respect to the dictionary comprising its Laplacian eigenbasis, and it is then possible to recover this sparse representation from partial measurements of the original shape. Taking advantage of the sparsity cue, we advocate a novel variational approach for surface inpainting, integrating data fidelity constraints on the shape domain with coefficient sparsity constraints on the transformed domain. Because of the powerful properties of Laplacian eigenbasis, the inpainting results of our method tend to be globally coherent with the remaining shape. Informative and discriminative feature descriptors are vital in qualitative and quantitative shape analysis for a large variety of graphics applications. We advocate novel strategies to define generalized, user-specified features on shapes. Our new region descriptors are primarily built upon the coefficients of spectral graph wavelets that are both multi-scale and multi-level in nature, consisting of both local and global information. Based on our novel spectral feature descriptor, we developed a user-specified feature detection framework and a tensor-based shape matching algorithm. Through various experiments, we demonstrate the competitive performance of our proposed methods and the great potential of spectral basis and sparsity-driven methods for shape modeling.
A high-capacity model for one shot association learning in the brain

PubMed Central

Einarsson, Hafsteinn; Lengler, Johannes; Steger, Angelika

2014-01-01

We present a high-capacity model for one-shot association learning (hetero-associative memory) in sparse networks. We assume that basic patterns are pre-learned in networks and associations between two patterns are presented only once and have to be learned immediately. The model is a combination of an Amit-Fusi like network sparsely connected to a Willshaw type network. The learning procedure is palimpsest and comes from earlier work on one-shot pattern learning. However, in our setup we can enhance the capacity of the network by iterative retrieval. This yields a model for sparse brain-like networks in which populations of a few thousand neurons are capable of learning hundreds of associations even if they are presented only once. The analysis of the model is based on a novel result by Janson et al. on bootstrap percolation in random graphs. PMID:25426060

A high-capacity model for one shot association learning in the brain.

PubMed

Einarsson, Hafsteinn; Lengler, Johannes; Steger, Angelika

2014-01-01

We present a high-capacity model for one-shot association learning (hetero-associative memory) in sparse networks. We assume that basic patterns are pre-learned in networks and associations between two patterns are presented only once and have to be learned immediately. The model is a combination of an Amit-Fusi like network sparsely connected to a Willshaw type network. The learning procedure is palimpsest and comes from earlier work on one-shot pattern learning. However, in our setup we can enhance the capacity of the network by iterative retrieval. This yields a model for sparse brain-like networks in which populations of a few thousand neurons are capable of learning hundreds of associations even if they are presented only once. The analysis of the model is based on a novel result by Janson et al. on bootstrap percolation in random graphs.
SPARSE—A subgrid particle averaged Reynolds stress equivalent model: testing with a priori closure

PubMed Central

Davis, Sean L.; Sen, Oishik; Udaykumar, H. S.

2017-01-01

A Lagrangian particle cloud model is proposed that accounts for the effects of Reynolds-averaged particle and turbulent stresses and the averaged carrier-phase velocity of the subparticle cloud scale on the averaged motion and velocity of the cloud. The SPARSE (subgrid particle averaged Reynolds stress equivalent) model is based on a combination of a truncated Taylor expansion of a drag correction function and Reynolds averaging. It reduces the required number of computational parcels to trace a cloud of particles in Eulerian–Lagrangian methods for the simulation of particle-laden flow. Closure is performed in an a priori manner using a reference simulation where all particles in the cloud are traced individually with a point-particle model. Comparison of a first-order model and SPARSE with the reference simulation in one dimension shows that both the stress and the averaging of the carrier-phase velocity on the cloud subscale affect the averaged motion of the particle. A three-dimensional isotropic turbulence computation shows that only one computational parcel is sufficient to accurately trace a cloud of tens of thousands of particles. PMID:28413341
A novel approach to internal crown characterization for coniferous tree species classification

NASA Astrophysics Data System (ADS)

Harikumar, A.; Bovolo, F.; Bruzzone, L.

2016-10-01

The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.
Image super-resolution via sparse representation.

PubMed

Yang, Jianchao; Wright, John; Huang, Thomas S; Ma, Yi

2010-11-01

This paper presents a new approach to single-image super-resolution, based on sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low resolution and high resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low resolution image patch can be applied with the high resolution image patch dictionary to generate a high resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs, reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle super-resolution with noisy inputs in a more unified framework.
The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

PubMed

Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

2011-06-01

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.
Spline curve matching with sparse knot sets

Treesearch

Sang-Mook Lee; A. Lynn Abbott; Neil A. Clark; Philip A. Araman

2004-01-01

This paper presents a new curve matching method for deformable shapes using two-dimensional splines. In contrast to the residual error criterion, which is based on relative locations of corresponding knot points such that is reliable primarily for dense point sets, we use deformation energy of thin-plate-spline mapping between sparse knot points and normalized local...
Adaptive regulation of sparseness by feedforward inhibition

PubMed Central

Assisi, Collins; Stopfer, Mark; Laurent, Gilles; Bazhenov, Maxim

2014-01-01

In the mushroom body of insects, odors are represented by very few spikes in a small number of neurons, a highly efficient strategy known as sparse coding. Physiological studies of these neurons have shown that sparseness is maintained across thousand-fold changes in odor concentration. Using a realistic computational model, we propose that sparseness in the olfactory system is regulated by adaptive feedforward inhibition. When odor concentration changes, feedforward inhibition modulates the duration of the temporal window over which the mushroom body neurons may integrate excitatory presynaptic input. This simple adaptive mechanism could maintain the sparseness of sensory representations across wide ranges of stimulus conditions. PMID:17660812
SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

NASA Astrophysics Data System (ADS)

Storm, Emma; Weniger, Christoph; Calore, Francesca

2017-08-01

We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Improved statistical power with a sparse shape model in detecting an aging effect in the hippocampus and amygdala

NASA Astrophysics Data System (ADS)

Chung, Moo K.; Kim, Seung-Goo; Schaefer, Stacey M.; van Reekum, Carien M.; Peschke-Schmitz, Lara; Sutterer, Matthew J.; Davidson, Richard J.

2014-03-01

The sparse regression framework has been widely used in medical image processing and analysis. However, it has been rarely used in anatomical studies. We present a sparse shape modeling framework using the Laplace- Beltrami (LB) eigenfunctions of the underlying shape and show its improvement of statistical power. Tradition- ally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes as a form of Fourier descriptors. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we present a LB-based method to filter out only the significant eigenfunctions by imposing a sparse penalty. For dense anatomical data such as deformation fields on a surface mesh, the sparse regression behaves like a smoothing process, which will reduce the error of incorrectly detecting false negatives. Hence the statistical power improves. The sparse shape model is then applied in investigating the influence of age on amygdala and hippocampus shapes in the normal population. The advantage of the LB sparse framework is demonstrated by showing the increased statistical power.
An active seismic experiment at Tenerife Island (Canary Island, Spain): Imaging an active volcano edifice

NASA Astrophysics Data System (ADS)

Garcia-Yeguas, A.; Ibañez, J. M.; Rietbrock, A.; Tom-Teidevs, G.

2008-12-01

An active seismic experiment to study the internal structure of Teide Volcano was carried out on Tenerife, a volcanic island in Spain's Canary Islands. The main objective of the TOM-TEIDEVS experiment is to obtain a 3-dimensional structural image of Teide Volcano using seismic tomography and seismic reflection/refraction imaging techniques. At present, knowledge of the deeper structure of Teide and Tenerife is very limited, with proposed structural models mainly based on sparse geophysical and geological data. This multinational experiment which involves institutes from Spain, Italy, the United Kingdom, Ireland, and Mexico will generate a unique high resolution structural image of the active volcano edifice and will further our understanding of volcanic processes.
Imaging an Active Volcano Edifice at Tenerife Island, Spain

NASA Astrophysics Data System (ADS)

Ibáñez, Jesús M.; Rietbrock, Andreas; García-Yeguas, Araceli

2008-08-01

An active seismic experiment to study the internal structure of Teide volcano is being carried out on Tenerife, a volcanic island in Spain's Canary Islands archipelago. The main objective of the Tomography at Teide Volcano Spain (TOM-TEIDEVS) experiment, begun in January 2007, is to obtain a three-dimensional (3-D) structural image of Teide volcano using seismic tomography and seismic reflection/refraction imaging techniques. At present, knowledge of the deeper structure of Teide and Tenerife is very limited, with proposed structural models based mainly on sparse geophysical and geological data. The multinational experiment-involving institutes from Spain, the United Kingdom, Italy, Ireland, and Mexico-will generate a unique high-resolution structural image of the active volcano edifice and will further our understanding of volcanic processes.
Enhanced spectral resolution by high-dimensional NMR using the filter diagonalization method and “hidden” dimensions

PubMed Central

Meng, Xi; Nguyen, Bao D.; Ridge, Clark; Shaka, A. J.

2009-01-01

High-dimensional (HD) NMR spectra have poorer digital resolution than low-dimensional (LD) spectra, for a fixed amount of experiment time. This has led to “reduced-dimensionality” strategies, in which several LD projections of the HD NMR spectrum are acquired, each with higher digital resolution; an approximate HD spectrum is then inferred by some means. We propose a strategy that moves in the opposite direction, by adding more time dimensions to increase the information content of the data set, even if only a very sparse time grid is used in each dimension. The full HD time-domain data can be analyzed by the Filter Diagonalization Method (FDM), yielding very narrow resonances along all of the frequency axes, even those with sparse sampling. Integrating over the added dimensions of HD FDM NMR spectra reconstitutes LD spectra with enhanced resolution, often more quickly than direct acquisition of the LD spectrum with a larger number of grid points in each of the fewer dimensions. If the extra dimensions do not appear in the final spectrum, and are used solely to boost information content, we propose the moniker hidden-dimension NMR. This work shows that HD peaks have unmistakable frequency signatures that can be detected as single HD objects by an appropriate algorithm, even though their patterns would be tricky for a human operator to visualize or recognize, and even if digital resolution in an HD FT spectrum is very coarse compared with natural line widths. PMID:18926747
Experimental and Analytical Determinations of Spiral Bevel Gear-Tooth Bending Stress Compared

NASA Technical Reports Server (NTRS)

Handschuh, Robert F.

2000-01-01

Spiral bevel gears are currently used in all main-rotor drive systems for rotorcraft produced in the United States. Applications such as these need spiral bevel gears to turn the corner from the horizontal gas turbine engine to the vertical rotor shaft. These gears must typically operate at extremely high rotational speeds and carry high power levels. With these difficult operating conditions, an improved analytical capability is paramount to increasing aircraft safety and reliability. Also, literature on the analysis and testing of spiral bevel gears has been very sparse in comparison to that for parallel axis gears. This is due to the complex geometry of this type of gear and to the specialized test equipment necessary to test these components. To develop an analytical model of spiral bevel gears, researchers use differential geometry methods to model the manufacturing kinematics. A three-dimensional spiral bevel gear modeling method was developed that uses finite elements for the structural analysis. This method was used to analyze the three-dimensional contact pattern between the test pinion and gear used in the Spiral Bevel Gear Test Facility at the NASA Glenn Research Center at Lewis Field. Results of this analysis are illustrated in the preceding figure. The development of the analytical method was a joint endeavor between NASA Glenn, the U.S. Army Research Laboratory, and the University of North Dakota.
QUADRO: A SUPERVISED DIMENSION REDUCTION METHOD VIA RAYLEIGH QUOTIENT OPTIMIZATION

PubMed Central

Fan, Jianqing; Ke, Zheng Tracy; Liu, Han; Xia, Lucy

2016-01-01

We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method—named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization)—for analyzing high-dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests. One major challenge of Rayleigh quotient optimization is that the variance of quadratic statistics involves all fourth cross-moments of predictors, which are infeasible to compute for high-dimensional applications and may accumulate too many stochastic errors. This issue is resolved by considering a family of elliptical models. Moreover, for heavy-tail distributions, robust estimates of mean vectors and covariance matrices are employed to guarantee uniform convergence in estimating non-polynomially many parameters, even though only the fourth moments are assumed. Methodologically, QUADRO is based on elliptical models which allow us to formulate the Rayleigh quotient maximization as a convex optimization problem. Computationally, we propose an efficient linearized augmented Lagrangian method to solve the constrained optimization problem. Theoretically, we provide explicit rates of convergence in terms of Rayleigh quotient under both Gaussian and general elliptical models. Thorough numerical results on both synthetic and real datasets are also provided to back up our theoretical results. PMID:26778864
Sparse Matrix for ECG Identification with Two-Lead Features.

PubMed

Tseng, Kuo-Kun; Luo, Jiao; Hegarty, Robert; Wang, Wenmin; Haiting, Dong

2015-01-01

Electrocardiograph (ECG) human identification has the potential to improve biometric security. However, improvements in ECG identification and feature extraction are required. Previous work has focused on single lead ECG signals. Our work proposes a new algorithm for human identification by mapping two-lead ECG signals onto a two-dimensional matrix then employing a sparse matrix method to process the matrix. And that is the first application of sparse matrix techniques for ECG identification. Moreover, the results of our experiments demonstrate the benefits of our approach over existing methods.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

PubMed

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

2016-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

PubMed Central

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

2018-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777
A three-dimensional wide-angle BPM for optical waveguide structures.

PubMed

Ma, Changbao; Van Keuren, Edward

2007-01-22

Algorithms for effective modeling of optical propagation in three- dimensional waveguide structures are critical for the design of photonic devices. We present a three-dimensional (3-D) wide-angle beam propagation method (WA-BPM) using Hoekstra's scheme. A sparse matrix algebraic equation is formed and solved using iterative methods. The applicability, accuracy and effectiveness of our method are demonstrated by applying it to simulations of wide-angle beam propagation, along with a technique for shifting the simulation window to reduce the dimension of the numerical equation and a threshold technique to further ensure its convergence. These techniques can ensure the implementation of iterative methods for waveguide structures by relaxing the convergence problem, which will further enable us to develop higher-order 3-D WA-BPMs based on Padé approximant operators.
A three-dimensional wide-angle BPM for optical waveguide structures

NASA Astrophysics Data System (ADS)

Ma, Changbao; van Keuren, Edward

2007-01-01

Algorithms for effective modeling of optical propagation in three- dimensional waveguide structures are critical for the design of photonic devices. We present a three-dimensional (3-D) wide-angle beam propagation method (WA-BPM) using Hoekstra’s scheme. A sparse matrix algebraic equation is formed and solved using iterative methods. The applicability, accuracy and effectiveness of our method are demonstrated by applying it to simulations of wide-angle beam propagation, along with a technique for shifting the simulation window to reduce the dimension of the numerical equation and a threshold technique to further ensure its convergence. These techniques can ensure the implementation of iterative methods for waveguide structures by relaxing the convergence problem, which will further enable us to develop higher-order 3-D WA-BPMs based on Padé approximant operators.

Convex Banding of the Covariance Matrix

PubMed Central

Bien, Jacob; Bunea, Florentina; Xiao, Luo

2016-01-01

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. PMID:28042189
Convex Banding of the Covariance Matrix.

PubMed

Bien, Jacob; Bunea, Florentina; Xiao, Luo

2016-01-01

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings.
Accelerating the reconstruction of magnetic resonance imaging by three-dimensional dual-dictionary learning using CUDA.

PubMed

Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao

2014-01-01

An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.
Localization with Sparse Acoustic Sensor Network Using UAVs as Information-Seeking Data Mules

DTIC Science & Technology

2013-05-01

technique to differentiate among several sources. 2.2. AoA Estimation AoA Models. The kth of NAOA AoA sensors produces an angular measurement modeled...squares sense. θ̂ = arg min φ 3∑ i=1 ( ̂τi0 − eTφ ri )2 (9) The minimization was done by gridding the one-dimensional angular space and finding the optimum...Latitude E5500 laptop running FreeBSD and custom Java applications to process and store the raw audio signals. Power Source: The laptop was powered for an
Optimized Design and Analysis of Sparse-Sampling fMRI Experiments

PubMed Central

Perrachione, Tyler K.; Ghosh, Satrajit S.

2013-01-01

Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power. PMID:23616742
Optimized design and analysis of sparse-sampling FMRI experiments.

PubMed

Perrachione, Tyler K; Ghosh, Satrajit S

2013-01-01

Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power.
High- and low-level hierarchical classification algorithm based on source separation process

NASA Astrophysics Data System (ADS)

Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber

2016-10-01

High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hampton, Jerrad; Doostan, Alireza, E-mail: alireza.doostan@colorado.edu

2015-01-01

Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as coherence, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ{sub 1}-minimization problem. Utilizing results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under their respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence onmore » the order of the approximation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases, the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.« less
Alternatively Constrained Dictionary Learning For Image Superresolution.

PubMed

Lu, Xiaoqiang; Yuan, Yuan; Yan, Pingkun

2014-03-01

Dictionaries are crucial in sparse coding-based algorithm for image superresolution. Sparse coding is a typical unsupervised learning method to study the relationship between the patches of high-and low-resolution images. However, most of the sparse coding methods for image superresolution fail to simultaneously consider the geometrical structure of the dictionary and the corresponding coefficients, which may result in noticeable superresolution reconstruction artifacts. In other words, when a low-resolution image and its corresponding high-resolution image are represented in their feature spaces, the two sets of dictionaries and the obtained coefficients have intrinsic links, which has not yet been well studied. Motivated by the development on nonlocal self-similarity and manifold learning, a novel sparse coding method is reported to preserve the geometrical structure of the dictionary and the sparse coefficients of the data. Moreover, the proposed method can preserve the incoherence of dictionary entries and provide the sparse coefficients and learned dictionary from a new perspective, which have both reconstruction and discrimination properties to enhance the learning performance. Furthermore, to utilize the model of the proposed method more effectively for single-image superresolution, this paper also proposes a novel dictionary-pair learning method, which is named as two-stage dictionary training. Extensive experiments are carried out on a large set of images comparing with other popular algorithms for the same purpose, and the results clearly demonstrate the effectiveness of the proposed sparse representation model and the corresponding dictionary learning algorithm.
Sparse Polynomial Chaos Surrogate for ACME Land Model via Iterative Bayesian Compressive Sensing

NASA Astrophysics Data System (ADS)

Sargsyan, K.; Ricciuto, D. M.; Safta, C.; Debusschere, B.; Najm, H. N.; Thornton, P. E.

2015-12-01

For computationally expensive climate models, Monte-Carlo approaches of exploring the input parameter space are often prohibitive due to slow convergence with respect to ensemble size. To alleviate this, we build inexpensive surrogates using uncertainty quantification (UQ) methods employing Polynomial Chaos (PC) expansions that approximate the input-output relationships using as few model evaluations as possible. However, when many uncertain input parameters are present, such UQ studies suffer from the curse of dimensionality. In particular, for 50-100 input parameters non-adaptive PC representations have infeasible numbers of basis terms. To this end, we develop and employ Weighted Iterative Bayesian Compressive Sensing to learn the most important input parameter relationships for efficient, sparse PC surrogate construction with posterior uncertainty quantified due to insufficient data. Besides drastic dimensionality reduction, the uncertain surrogate can efficiently replace the model in computationally intensive studies such as forward uncertainty propagation and variance-based sensitivity analysis, as well as design optimization and parameter estimation using observational data. We applied the surrogate construction and variance-based uncertainty decomposition to Accelerated Climate Model for Energy (ACME) Land Model for several output QoIs at nearly 100 FLUXNET sites covering multiple plant functional types and climates, varying 65 input parameters over broad ranges of possible values. This work is supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research, Accelerated Climate Modeling for Energy (ACME) project. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
A non-linear dimension reduction methodology for generating data-driven stochastic input models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ganapathysubramanian, Baskar; Zabaras, Nicholas

Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem ofmore » manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space R{sup n}. An isometric mapping F from M to a low-dimensional, compact, connected set A is contained in R{sup d}(d<
Data Shared Lasso: A Novel Tool to Discover Uplift.

PubMed

Gross, Samuel M; Tibshirani, Robert

2016-09-01

A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card.
Template-based protein structure modeling using the RaptorX web server.

PubMed

Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo

2012-07-19

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.
Data Shared Lasso: A Novel Tool to Discover Uplift

PubMed Central

Gross, Samuel M.; Tibshirani, Robert

2017-01-01

A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card. PMID:29056802
Template-based protein structure modeling using the RaptorX web server

PubMed Central

Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo

2016-01-01

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. PMID:22814390
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.

PubMed

Gijsberts, Arjan; Metta, Giorgio

2013-05-01

Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
Three-dimensional forward modeling of DC resistivity using the aggregation-based algebraic multigrid method

NASA Astrophysics Data System (ADS)

Chen, Hui; Deng, Ju-Zhi; Yin, Min; Yin, Chang-Chun; Tang, Wen-Wu

2017-03-01

To speed up three-dimensional (3D) DC resistivity modeling, we present a new multigrid method, the aggregation-based algebraic multigrid method (AGMG). We first discretize the differential equation of the secondary potential field with mixed boundary conditions by using a seven-point finite-difference method to obtain a large sparse system of linear equations. Then, we introduce the theory behind the pairwise aggregation algorithms for AGMG and use the conjugate-gradient method with the V-cycle AGMG preconditioner (AGMG-CG) to solve the linear equations. We use typical geoelectrical models to test the proposed AGMG-CG method and compare the results with analytical solutions and the 3DDCXH algorithm for 3D DC modeling (3DDCXH). In addition, we apply the AGMG-CG method to different grid sizes and geoelectrical models and compare it to different iterative methods, such as ILU-BICGSTAB, ILU-GCR, and SSOR-CG. The AGMG-CG method yields nearly linearly decreasing errors, whereas the number of iterations increases slowly with increasing grid size. The AGMG-CG method is precise and converges fast, and thus can improve the computational efficiency in forward modeling of three-dimensional DC resistivity.
Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

NASA Technical Reports Server (NTRS)

Barth, Timothy J.

2014-01-01

This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.
A latent class distance association model for cross-classified data with a categorical response variable.

PubMed

Vera, José Fernando; de Rooij, Mark; Heiser, Willem J

2014-11-01

In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low-dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented. © 2014 The British Psychological Society.
Communication requirements of sparse Cholesky factorization with nested dissection ordering

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.

Principles for problem aggregation and assignment in medium scale multiprocessors

NASA Technical Reports Server (NTRS)

Nicol, David M.; Saltz, Joel H.

1987-01-01

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
A cosmic web filament revealed in Lyman-α emission around a luminous high-redshift quasar.

PubMed

Cantalupo, Sebastiano; Arrigoni-Battaia, Fabrizio; Prochaska, J Xavier; Hennawi, Joseph F; Madau, Piero

2014-02-06

Simulations of structure formation in the Universe predict that galaxies are embedded in a 'cosmic web', where most baryons reside as rarefied and highly ionized gas. This material has been studied for decades in absorption against background sources, but the sparseness of these inherently one-dimensional probes preclude direct constraints on the three-dimensional morphology of the underlying web. Here we report observations of a cosmic web filament in Lyman-α emission, discovered during a survey for cosmic gas fluorescently illuminated by bright quasars at redshift z ≈ 2.3. With a linear projected size of approximately 460 physical kiloparsecs, the Lyman-α emission surrounding the radio-quiet quasar UM 287 extends well beyond the virial radius of any plausible associated dark-matter halo and therefore traces intergalactic gas. The estimated cold gas mass of the filament from the observed emission-about 10(12.0 ± 0.5)/C(1/2) solar masses, where C is the gas clumping factor-is more than ten times larger than what is typically found in cosmological simulations, suggesting that a population of intergalactic gas clumps with subkiloparsec sizes may be missing in current numerical models.
Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection

PubMed Central

Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

2013-01-01

The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629
Online least squares one-class support vector machines-based abnormal visual event detection.

PubMed

Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

2013-12-12

The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dayman, Ken J; Ade, Brian J; Weber, Charles F

High-dimensional, nonlinear function estimation using large datasets is a current area of interest in the machine learning community, and applications may be found throughout the analytical sciences, where ever-growing datasets are making more information available to the analyst. In this paper, we leverage the existing relevance vector machine, a sparse Bayesian version of the well-studied support vector machine, and expand the method to include integrated feature selection and automatic function shaping. These innovations produce an algorithm that is able to distinguish variables that are useful for making predictions of a response from variables that are unrelated or confusing. We testmore » the technology using synthetic data, conduct initial performance studies, and develop a model capable of making position-independent predictions of the coreaveraged burnup using a single specimen drawn randomly from a nuclear reactor core.« less
A combinatorial model for dentate gyrus sparse coding

DOE PAGES

Severa, William; Parekh, Ojas; James, Conrad D.; ...

2016-12-29

The dentate gyrus forms a critical link between the entorhinal cortex and CA3 by providing a sparse version of the signal. Concurrent with this increase in sparsity, a widely accepted theory suggests the dentate gyrus performs pattern separation—similar inputs yield decorrelated outputs. Although an active region of study and theory, few logically rigorous arguments detail the dentate gyrus’s (DG) coding. We suggest a theoretically tractable, combinatorial model for this action. The model provides formal methods for a highly redundant, arbitrarily sparse, and decorrelated output signal.To explore the value of this model framework, we assess how suitable it is for twomore » notable aspects of DG coding: how it can handle the highly structured grid cell representation in the input entorhinal cortex region and the presence of adult neurogenesis, which has been proposed to produce a heterogeneous code in the DG. We find tailoring the model to grid cell input yields expansion parameters consistent with the literature. In addition, the heterogeneous coding reflects activity gradation observed experimentally. Lastly, we connect this approach with more conventional binary threshold neural circuit models via a formal embedding.« less
SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Storm, Emma; Weniger, Christoph; Calore, Francesca, E-mail: e.m.storm@uva.nl, E-mail: c.weniger@uva.nl, E-mail: francesca.calore@lapth.cnrs.fr

We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (∼> 10{sup 5}) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that aremore » motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |ℓ|<90{sup o} and | b |<20{sup o}, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.« less
GALAPAGOS-C: analysis of galaxy morphologies using high-performance computing methods

NASA Astrophysics Data System (ADS)

Hiemer, Andreas; Barden, Marco; Kelvin, Lee S.; Häußler, Boris; Schindler, Sabine

2014-11-01

We present GALAPAGOS-C, a code designed to process a complete set of survey images through automation of source detection (via SEXTRACTOR), postage stamp cutting, object mask preparation, sky background estimation and complex two-dimensional light profile Sérsic modelling (via GALFIT). GALAPAGOS-C is designed around the concept of MPI-parallelization, allowing the processing of large data sets in a quick and efficient manner. Further, GALAPAGOS-C is capable of fitting multiple-Sérsic profiles to each galaxy, each representing distinct galaxy components (e.g. bulge, disc, bar), in addition to the option to fit asymmetric Fourier mode distortions. The modelling reliability of our core single-Sérsic fitting capability is tested thoroughly using image simulations. We apply GALAPAGOS-C to the Space Telescope A901/902 Galaxy Evolution Survey to investigate the evolution of galaxy structure with cosmic time and the dependence on environment. We measure the distribution of Sérsic indices as a function of local object density in the A901/902 cluster sample to provide one of the first measures of the Sérsic index-density relation. We find that the fraction of galaxies with a high Sérsic index (2.5 < n < 7.0) is higher in denser environments (˜35 per cent), halving towards sparsely populated regions (˜15 per cent). The population of low Sérsic index galaxies (0.4 < n< 1.6) is lower in denser environments (˜35 per cent), increasing towards sparsely populated regions (˜60 per cent). The population of intermediate Sérsic index galaxies (1.6 < n < 2.5) approximately follows the trend of the high Sérsic index types.
Multiphysics Simulations of Hot-Spot Initiation in Shocked Insensitive High-Explosive

NASA Astrophysics Data System (ADS)

Najjar, Fady; Howard, W. M.; Fried, L. E.

2010-11-01

Solid plastic-bonded high-explosive materials consist of crystals with micron-sized pores embedded. Under mechanical or thermal insults, these voids increase the ease of shock initiation by generating high-temperature regions during their collapse that might lead to ignition. Understanding the mechanisms of hot-spot initiation has significant research interest due to safety, reliability and development of new insensitive munitions. Multi-dimensional high-resolution meso-scale simulations are performed using the multiphysics software, ALE3D, to understand the hot-spot initiation. The Cheetah code is coupled to ALE3D, creating multi-dimensional sparse tables for the HE properties. The reaction rates were obtained from MD Quantum computations. Our current predictions showcase several interesting features regarding hot spot dynamics including the formation of a "secondary" jet. We will discuss the results obtained with hydro-thermo-chemical processes leading to ignition growth for various pore sizes and different shock pressures.
Leveraging EAP-Sparsity for Compressed Sensing of MS-HARDI in (k, q)-Space.

PubMed

Sun, Jiaqi; Sakhaee, Elham; Entezari, Alireza; Vemuri, Baba C

2015-01-01

Compressed Sensing (CS) for the acceleration of MR scans has been widely investigated in the past decade. Lately, considerable progress has been made in achieving similar speed ups in acquiring multi-shell high angular resolution diffusion imaging (MS-HARDI) scans. Existing approaches in this context were primarily concerned with sparse reconstruction of the diffusion MR signal S(q) in the q-space. More recently, methods have been developed to apply the compressed sensing framework to the 6-dimensional joint (k, q)-space, thereby exploiting the redundancy in this 6D space. To guarantee accurate reconstruction from partial MS-HARDI data, the key ingredients of compressed sensing that need to be brought together are: (1) the function to be reconstructed needs to have a sparse representation, and (2) the data for reconstruction ought to be acquired in the dual domain (i.e., incoherent sensing) and (3) the reconstruction process involves a (convex) optimization. In this paper, we present a novel approach that uses partial Fourier sensing in the 6D space of (k, q) for the reconstruction of P(x, r). The distinct feature of our approach is a sparsity model that leverages surfacelets in conjunction with total variation for the joint sparse representation of P(x, r). Thus, our method stands to benefit from the practical guarantees for accurate reconstruction from partial (k, q)-space data. Further, we demonstrate significant savings in acquisition time over diffusion spectral imaging (DSI) which is commonly used as the benchmark for comparisons in reported literature. To demonstrate the benefits of this approach,.we present several synthetic and real data examples.
An ensemble predictive modeling framework for breast cancer classification.

PubMed

Nagarajan, Radhakrishnan; Upreti, Meenakshi

2017-12-01

Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

PubMed

Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

2017-01-01

This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.
Virtual screening of inorganic materials synthesis parameters with deep learning

NASA Astrophysics Data System (ADS)

Kim, Edward; Huang, Kevin; Jegelka, Stefanie; Olivetti, Elsa

2017-12-01

Virtual materials screening approaches have proliferated in the past decade, driven by rapid advances in first-principles computational techniques, and machine-learning algorithms. By comparison, computationally driven materials synthesis screening is still in its infancy, and is mired by the challenges of data sparsity and data scarcity: Synthesis routes exist in a sparse, high-dimensional parameter space that is difficult to optimize over directly, and, for some materials of interest, only scarce volumes of literature-reported syntheses are available. In this article, we present a framework for suggesting quantitative synthesis parameters and potential driving factors for synthesis outcomes. We use a variational autoencoder to compress sparse synthesis representations into a lower dimensional space, which is found to improve the performance of machine-learning tasks. To realize this screening framework even in cases where there are few literature data, we devise a novel data augmentation methodology that incorporates literature synthesis data from related materials systems. We apply this variational autoencoder framework to generate potential SrTiO3 synthesis parameter sets, propose driving factors for brookite TiO2 formation, and identify correlations between alkali-ion intercalation and MnO2 polymorph selection.
Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes.

PubMed

Xuan, Junyu; Lu, Jie; Zhang, Guangquan; Xu, Richard Yi Da; Luo, Xiangfeng

2018-05-01

Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.
Sparse grid techniques for particle-in-cell schemes

NASA Astrophysics Data System (ADS)

Ricketson, L. F.; Cerfon, A. J.

2017-02-01

We propose the use of sparse grids to accelerate particle-in-cell (PIC) schemes. By using the so-called ‘combination technique’ from the sparse grids literature, we are able to dramatically increase the size of the spatial cells in multi-dimensional PIC schemes while paying only a slight penalty in grid-based error. The resulting increase in cell size allows us to reduce the statistical noise in the simulation without increasing total particle number. We present initial proof-of-principle results from test cases in two and three dimensions that demonstrate the new scheme’s efficiency, both in terms of computation time and memory usage.
Elastic-Waveform Inversion with Compressive Sensing for Sparse Seismic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; Huang, Lianjie

2015-01-28

Accurate velocity models of compressional- and shear-waves are essential for geothermal reservoir characterization and microseismic imaging. Elastic-waveform inversion of multi-component seismic data can provide high-resolution inversion results of subsurface geophysical properties. However, the method requires seismic data acquired using dense source and receiver arrays. In practice, seismic sources and/or geophones are often sparsely distributed on the surface and/or in a borehole, such as 3D vertical seismic profiling (VSP) surveys. We develop a novel elastic-waveform inversion method with compressive sensing for inversion of sparse seismic data. We employ an alternating-minimization algorithm to solve the optimization problem of our new waveform inversionmore » method. We validate our new method using synthetic VSP data for a geophysical model built using geologic features found at the Raft River enhanced-geothermal-system (EGS) field. We apply our method to synthetic VSP data with a sparse source array and compare the results with those obtained with a dense source array. Our numerical results demonstrate that the velocity models produced with our new method using a sparse source array are almost as accurate as those obtained using a dense source array.« less
Sparse gammatone signal model optimized for English speech does not match the human auditory filters.

PubMed

Strahl, Stefan; Mertins, Alfred

2008-07-18

Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.
Three-dimensional unstructured grid Euler computations using a fully-implicit, upwind method

NASA Technical Reports Server (NTRS)

Whitaker, David L.

1993-01-01

A method has been developed to solve the Euler equations on a three-dimensional unstructured grid composed of tetrahedra. The method uses an upwind flow solver with a linearized, backward-Euler time integration scheme. Each time step results in a sparse linear system of equations which is solved by an iterative, sparse matrix solver. Local-time stepping, switched evolution relaxation (SER), preconditioning and reuse of the Jacobian are employed to accelerate the convergence rate. Implicit boundary conditions were found to be extremely important for fast convergence. Numerical experiments have shown that convergence rates comparable to that of a multigrid, central-difference scheme are achievable on the same mesh. Results are presented for several grids about an ONERA M6 wing.
Spatial Bayesian Latent Factor Regression Modeling of Coordinate-based Meta-analysis Data

PubMed Central

Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D.; Nichols, Thomas E.

2017-01-01

Summary Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the paper are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to 1) identify areas of consistent activation; and 2) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterised as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. PMID:28498564
Two alternate proofs of Wang's lune formula for sparse distributed memory and an integral approximation

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1988-01-01

In Kanerva's Sparse Distributed Memory, writing to and reading from the memory are done in relation to spheres in an n-dimensional binary vector space. Thus it is important to know how many points are in the intersection of two spheres in this space. Two proofs are given of Wang's formula for spheres of unequal radii, and an integral approximation for the intersection in this case.

Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data.

PubMed

Basu, Sumanta; Duren, William; Evans, Charles R; Burant, Charles F; Michailidis, George; Karnovsky, Alla

2017-05-15

Recent technological advances in mass spectrometry, development of richer mass spectral libraries and data processing tools have enabled large scale metabolic profiling. Biological interpretation of metabolomics studies heavily relies on knowledge-based tools that contain information about metabolic pathways. Incomplete coverage of different areas of metabolism and lack of information about non-canonical connections between metabolites limits the scope of applications of such tools. Furthermore, the presence of a large number of unknown features, which cannot be readily identified, but nonetheless can represent bona fide compounds, also considerably complicates biological interpretation of the data. Leveraging recent developments in the statistical analysis of high-dimensional data, we developed a new Debiased Sparse Partial Correlation algorithm (DSPC) for estimating partial correlation networks and implemented it as a Java-based CorrelationCalculator program. We also introduce a new version of our previously developed tool Metscape that enables building and visualization of correlation networks. We demonstrate the utility of these tools by constructing biologically relevant networks and in aiding identification of unknown compounds. http://metscape.med.umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Modeling of Sensor Placement Strategy for Shape Sensing and Structural Health Monitoring of a Wing-Shaped Sandwich Panel Using Inverse Finite Element Method.

PubMed

Kefal, Adnan; Yildiz, Mehmet

2017-11-30

This paper investigated the effect of sensor density and alignment for three-dimensional shape sensing of an airplane-wing-shaped thick panel subjected to three different loading conditions, i.e., bending, torsion, and membrane loads. For shape sensing analysis of the panel, the Inverse Finite Element Method (iFEM) was used together with the Refined Zigzag Theory (RZT), in order to enable accurate predictions for transverse deflection and through-the-thickness variation of interfacial displacements. In this study, the iFEM-RZT algorithm is implemented by utilizing a novel three-node C°-continuous inverse-shell element, known as i3-RZT. The discrete strain data is generated numerically through performing a high-fidelity finite element analysis on the wing-shaped panel. This numerical strain data represents experimental strain readings obtained from surface patched strain gauges or embedded fiber Bragg grating (FBG) sensors. Three different sensor placement configurations with varying density and alignment of strain data were examined and their corresponding displacement contours were compared with those of reference solutions. The results indicate that a sparse distribution of FBG sensors (uniaxial strain measurements), aligned in only the longitudinal direction, is sufficient for predicting accurate full-field membrane and bending responses (deformed shapes) of the panel, including a true zigzag representation of interfacial displacements. On the other hand, a sparse deployment of strain rosettes (triaxial strain measurements) is essentially enough to produce torsion shapes that are as accurate as those of predicted by a dense sensor placement configuration. Hence, the potential applicability and practical aspects of i3-RZT/iFEM methodology is proven for three-dimensional shape-sensing of future aerospace structures.
Partitioned learning of deep Boltzmann machines for SNP data.

PubMed

Hess, Moritz; Lenz, Stefan; Blätte, Tamara J; Bullinger, Lars; Binder, Harald

2017-10-15

Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen the joint distribution of SNPs, followed by training several DBMs on SNP partitions that were identified by the screening. Aggregate features representing SNP patterns and the corresponding SNPs are extracted from the DBMs by a combination of statistical tests and sparse regression. In simulated case-control data, we show how this can uncover complex SNP patterns and augment results from univariate approaches, while maintaining type 1 error control. Time-to-event endpoints are considered in an application with acute myeloid leukemia patients, where SNP patterns are modeled after a pre-screening based on gene expression data. The proposed approach identified three SNPs that seem to jointly influence survival in a validation dataset. This indicates the added value of jointly investigating SNPs compared to standard univariate analyses and makes partitioned learning of DBMs an interesting complementary approach when analyzing SNP data. A Julia package is provided at 'http://github.com/binderh/BoltzmannMachines.jl'. binderh@imbi.uni-freiburg.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Sparse representations via learned dictionaries for x-ray angiogram image denoising

NASA Astrophysics Data System (ADS)

Shang, Jingfan; Huang, Zhenghua; Li, Qian; Zhang, Tianxu

2018-03-01

X-ray angiogram image denoising is always an active research topic in the field of computer vision. In particular, the denoising performance of many existing methods had been greatly improved by the widely use of nonlocal similar patches. However, the only nonlocal self-similar (NSS) patch-based methods can be still be improved and extended. In this paper, we propose an image denoising model based on the sparsity of the NSS patches to obtain high denoising performance and high-quality image. In order to represent the sparsely NSS patches in every location of the image well and solve the image denoising model more efficiently, we obtain dictionaries as a global image prior by the K-SVD algorithm over the processing image; Then the single and effectively alternating directions method of multipliers (ADMM) method is used to solve the image denoising model. The results of widely synthetic experiments demonstrate that, owing to learned dictionaries by K-SVD algorithm, a sparsely augmented lagrangian image denoising (SALID) model, which perform effectively, obtains a state-of-the-art denoising performance and better high-quality images. Moreover, we also give some denoising results of clinical X-ray angiogram images.
Local structure preserving sparse coding for infrared target recognition

PubMed Central

Han, Jing; Yue, Jiang; Zhang, Yi; Bai, Lianfa

2017-01-01

Sparse coding performs well in image classification. However, robust target recognition requires a lot of comprehensive template images and the sparse learning process is complex. We incorporate sparsity into a template matching concept to construct a local sparse structure matching (LSSM) model for general infrared target recognition. A local structure preserving sparse coding (LSPSc) formulation is proposed to simultaneously preserve the local sparse and structural information of objects. By adding a spatial local structure constraint into the classical sparse coding algorithm, LSPSc can improve the stability of sparse representation for targets and inhibit background interference in infrared images. Furthermore, a kernel LSPSc (K-LSPSc) formulation is proposed, which extends LSPSc to the kernel space to weaken the influence of the linear structure constraint in nonlinear natural data. Because of the anti-interference and fault-tolerant capabilities, both LSPSc- and K-LSPSc-based LSSM can implement target identification based on a simple template set, which just needs several images containing enough local sparse structures to learn a sufficient sparse structure dictionary of a target class. Specifically, this LSSM approach has stable performance in the target detection with scene, shape and occlusions variations. High performance is demonstrated on several datasets, indicating robust infrared target recognition in diverse environments and imaging conditions. PMID:28323824
Sparse dictionary learning of resting state fMRI networks.

PubMed

Eavani, Harini; Filipovych, Roman; Davatzikos, Christos; Satterthwaite, Theodore D; Gur, Raquel E; Gur, Ruben C

2012-07-02

Research in resting state fMRI (rsfMRI) has revealed the presence of stable, anti-correlated functional subnetworks in the brain. Task-positive networks are active during a cognitive process and are anti-correlated with task-negative networks, which are active during rest. In this paper, based on the assumption that the structure of the resting state functional brain connectivity is sparse, we utilize sparse dictionary modeling to identify distinct functional sub-networks. We propose two ways of formulating the sparse functional network learning problem that characterize the underlying functional connectivity from different perspectives. Our results show that the whole-brain functional connectivity can be concisely represented with highly modular, overlapping task-positive/negative pairs of sub-networks.
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Face recognition from unconstrained three-dimensional face images using multitask sparse representation

NASA Astrophysics Data System (ADS)

Bentaieb, Samia; Ouamri, Abdelaziz; Nait-Ali, Amine; Keche, Mokhtar

2018-01-01

We propose and evaluate a three-dimensional (3D) face recognition approach that applies the speeded up robust feature (SURF) algorithm to the depth representation of shape index map, under real-world conditions, using only a single gallery sample for each subject. First, the 3D scans are preprocessed, then SURF is applied on the shape index map to find interest points and their descriptors. Each 3D face scan is represented by keypoints descriptors, and a large dictionary is built from all the gallery descriptors. At the recognition step, descriptors of a probe face scan are sparsely represented by the dictionary. A multitask sparse representation classification is used to determine the identity of each probe face. The feasibility of the approach that uses the SURF algorithm on the shape index map for face identification/authentication is checked through an experimental investigation conducted on Bosphorus, University of Milano Bicocca, and CASIA 3D datasets. It achieves an overall rank one recognition rate of 97.75%, 80.85%, and 95.12%, respectively, on these datasets.
Sparse short-distance connections enhance calcium wave propagation in a 3D model of astrocyte networks

PubMed Central

Lallouette, Jules; De Pittà, Maurizio; Ben-Jacob, Eshel; Berry, Hugues

2014-01-01

Traditionally, astrocytes have been considered to couple via gap-junctions into a syncytium with only rudimentary spatial organization. However, this view is challenged by growing experimental evidence that astrocytes organize as a proper gap-junction mediated network with more complex region-dependent properties. On the other hand, the propagation range of intercellular calcium waves (ICW) within astrocyte populations is as well highly variable, depending on the brain region considered. This suggests that the variability of the topology of gap-junction couplings could play a role in the variability of the ICW propagation range. Since this hypothesis is very difficult to investigate with current experimental approaches, we explore it here using a biophysically realistic model of three-dimensional astrocyte networks in which we varied the topology of the astrocyte network, while keeping intracellular properties and spatial cell distribution and density constant. Computer simulations of the model suggest that changing the topology of the network is indeed sufficient to reproduce the distinct ranges of ICW propagation reported experimentally. Unexpectedly, our simulations also predict that sparse connectivity and restriction of gap-junction couplings to short distances should favor propagation while long–distance or dense connectivity should impair it. Altogether, our results provide support to recent experimental findings that point toward a significant functional role of the organization of gap-junction couplings into proper astroglial networks. Dynamic control of this topology by neurons and signaling molecules could thus constitute a new type of regulation of neuron-glia and glia-glia interactions. PMID:24795613
Application of High-resolution Aerial LiDAR Data in Calibration of a Two-dimensional Urban Flood Simulation

NASA Astrophysics Data System (ADS)

Piotrowski, J.; Goska, R.; Chen, B.; Krajewski, W. F.; Young, N.; Weber, L.

2009-12-01

In June 2008, the state of Iowa experienced an unprecedented flood event which resulted in an economic loss of approximately $2.88 billion. Flooding in the Iowa River corridor, which exceeded the previous flood of record by 3 feet, devastated several communities, including Coralville and Iowa City, home to the University of Iowa. Recognizing an opportunity to capture a unique dataset detailing the impacts of the historic flood, the investigators contacted the National Center for Airborne Laser Mapping (NCALM), which performed an aerial Light Detection and Ranging (LiDAR) survey along the Iowa River. The survey, conducted immediately following the flood peak, provided coverage of a 60-mile reach. The goal of the present research is to develop a process by which flood extents and water surface elevations can be accurately extracted from the LiDAR data set and to evaluate the benefit of such data in calibrating one- and two-dimensional hydraulic models. Whereas data typically available for model calibration include sparsely distributed point observations and high water marks, the LiDAR data used in the present study provide broad-scale, detailed, and continuous information describing the spatial extent and depth of flooding. Initial efforts were focused on a 10-mile, primarily urban reach of the Iowa River extending from Coralville Reservoir, a United States Army Corps of Engineers flood control project, downstream through the Coralville and Iowa City. Spatial extent and depth of flooding were estimated from the LiDAR data. At a given cross-sectional location, river channel and floodplain measurements were compared. When differences between floodplain and river channel measurements were less than a standard deviation of the vertical uncertainty in the LiDAR survey, floodplain measurements were classified as flooded. A flood water surface DEM was created using measurements classified as flooded. A two-dimensional, depth-averaged numerical model of a 10-mile reach of the Iowa River corridor was developed using the United States Bureau of Reclamation SRH-2D hydraulic modeling software. The numerical model uses an unstructured numerical mesh and variable surface roughness, assigned according to observed land use and cover. The numerical model was calibrated using inundation extents and water surface elevations derived from the LiDAR data. It was also calibrated using high water marks and land survey data collected daily during the 2008 flood. The investigators compared the two calibrations to evaluate the benefit of high-resolution LiDAR data in improving the accuracy of a two-dimensional urban flood simulation.
A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

PubMed

Deng, Lei; Fan, Chao; Zeng, Zhiwen

2017-12-28

Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.
Time integration algorithms for the two-dimensional Euler equations on unstructured meshes

NASA Technical Reports Server (NTRS)

Slack, David C.; Whitaker, D. L.; Walters, Robert W.

1994-01-01

Explicit and implicit time integration algorithms for the two-dimensional Euler equations on unstructured grids are presented. Both cell-centered and cell-vertex finite volume upwind schemes utilizing Roe's approximate Riemann solver are developed. For the cell-vertex scheme, a four-stage Runge-Kutta time integration, a fourstage Runge-Kutta time integration with implicit residual averaging, a point Jacobi method, a symmetric point Gauss-Seidel method and two methods utilizing preconditioned sparse matrix solvers are presented. For the cell-centered scheme, a Runge-Kutta scheme, an implicit tridiagonal relaxation scheme modeled after line Gauss-Seidel, a fully implicit lower-upper (LU) decomposition, and a hybrid scheme utilizing both Runge-Kutta and LU methods are presented. A reverse Cuthill-McKee renumbering scheme is employed for the direct solver to decrease CPU time by reducing the fill of the Jacobian matrix. A comparison of the various time integration schemes is made for both first-order and higher order accurate solutions using several mesh sizes, higher order accuracy is achieved by using multidimensional monotone linear reconstruction procedures. The results obtained for a transonic flow over a circular arc suggest that the preconditioned sparse matrix solvers perform better than the other methods as the number of elements in the mesh increases.
Structure and stability of genetic variance-covariance matrices: A Bayesian sparse factor analysis of transcriptional variation in the three-spined stickleback.

PubMed

Siren, J; Ovaskainen, O; Merilä, J

2017-10-01

The genetic variance-covariance matrix (G) is a quantity of central importance in evolutionary biology due to its influence on the rate and direction of multivariate evolution. However, the predictive power of empirically estimated G-matrices is limited for two reasons. First, phenotypes are high-dimensional, whereas traditional statistical methods are tuned to estimate and analyse low-dimensional matrices. Second, the stability of G to environmental effects and over time remains poorly understood. Using Bayesian sparse factor analysis (BSFG) designed to estimate high-dimensional G-matrices, we analysed levels variation and covariation in 10,527 expressed genes in a large (n = 563) half-sib breeding design of three-spined sticklebacks subject to two temperature treatments. We found significant differences in the structure of G between the treatments: heritabilities and evolvabilities were higher in the warm than in the low-temperature treatment, suggesting more and faster opportunity to evolve in warm (stressful) conditions. Furthermore, comparison of G and its phenotypic equivalent P revealed the latter is a poor substitute of the former. Most strikingly, the results suggest that the expected impact of G on evolvability-as well as the similarity among G-matrices-may depend strongly on the number of traits included into analyses. In our results, the inclusion of only few traits in the analyses leads to underestimation in the differences between the G-matrices and their predicted impacts on evolution. While the results highlight the challenges involved in estimating G, they also illustrate that by enabling the estimation of large G-matrices, the BSFG method can improve predicted evolutionary responses to selection. © 2017 John Wiley & Sons Ltd.
Population coding in sparsely connected networks of noisy neurons.

PubMed

Tripp, Bryan P; Orchard, Jeff

2012-01-01

This study examines the relationship between population coding and spatial connection statistics in networks of noisy neurons. Encoding of sensory information in the neocortex is thought to require coordinated neural populations, because individual cortical neurons respond to a wide range of stimuli, and exhibit highly variable spiking in response to repeated stimuli. Population coding is rooted in network structure, because cortical neurons receive information only from other neurons, and because the information they encode must be decoded by other neurons, if it is to affect behavior. However, population coding theory has often ignored network structure, or assumed discrete, fully connected populations (in contrast with the sparsely connected, continuous sheet of the cortex). In this study, we modeled a sheet of cortical neurons with sparse, primarily local connections, and found that a network with this structure could encode multiple internal state variables with high signal-to-noise ratio. However, we were unable to create high-fidelity networks by instantiating connections at random according to spatial connection probabilities. In our models, high-fidelity networks required additional structure, with higher cluster factors and correlations between the inputs to nearby neurons.
Modeling multivariate time series on manifolds with skew radial basis functions.

PubMed

Jamshidi, Arta A; Kirby, Michael J

2011-01-01

We present an approach for constructing nonlinear empirical mappings from high-dimensional domains to multivariate ranges. We employ radial basis functions and skew radial basis functions for constructing a model using data that are potentially scattered or sparse. The algorithm progresses iteratively, adding a new function at each step to refine the model. The placement of the functions is driven by a statistical hypothesis test that accounts for correlation in the multivariate range variables. The test is applied on training and validation data and reveals nonstatistical or geometric structure when it fails. At each step, the added function is fit to data contained in a spatiotemporally defined local region to determine the parameters--in particular, the scale of the local model. The scale of the function is determined by the zero crossings of the autocorrelation function of the residuals. The model parameters and the number of basis functions are determined automatically from the given data, and there is no need to initialize any ad hoc parameters save for the selection of the skew radial basis functions. Compactly supported skew radial basis functions are employed to improve model accuracy, order, and convergence properties. The extension of the algorithm to higher-dimensional ranges produces reduced-order models by exploiting the existence of correlation in the range variable data. Structure is tested not just in a single time series but between all pairs of time series. We illustrate the new methodologies using several illustrative problems, including modeling data on manifolds and the prediction of chaotic time series.
Porosimetry and packing morphology of vertically-aligned carbon nanotube arrays via impedance spectroscopy.

PubMed

Mutha, Heena K; Lu, Yuan; Stein, Itai; Cho, H Jeremy; Suss, Matthew; Laoui, Tahar; Thompson, Carl; Wardle, Brian; Wang, Evelyn

2016-12-13

Vertically aligned one-dimensional nanostructure arrays are promising in many applications such as electrochemical systems, solar cells, and electronics, taking advantage of high surface area per unit volume, nanometer length scale packing, and alignment leading to high conductivity. However, many devices need to optimize arrays for device performance by selecting an appropriate morphology. Developing a simple, non-invasive tool for understanding the role of pore volume distribution and interspacing would aid in the optimization of nanostructure morphologies in electrodes. In this work, we combined electrochemical impedance spectroscopy (EIS) with capacitance measurements and porous electrode theory to conduct in situ porosimetry of vertically-aligned carbon nanotubes (VA-CNTs) non-destructively. We utilized the EIS measurements with a pore size distribution model to quantify the average and dispersion of inter-CNT spacing (Γ), stochastically, in carpets that were mechanically densified from 1.7 × 10¹⁰ tubes/cm² to 4.5 × 10¹¹ tubes/cm². Our analysis predicts that the inter-CNT spacing ranges from over 100 ± 50 nm in sparse carpets to sub 10 ± 5 nm in packed carpets. Our results suggest that waviness of CNTs leads to variations in the inter-CNT spacing, which can be significant in sparse carpets. This methodology can be used to predict the performance of many nanostructured devices, including supercapacitors, batteries, solar cells, and semiconductor electronics. Copyright 2016 IOP Publishing Ltd.
An Improved Nested Sampling Algorithm for Model Selection and Assessment

NASA Astrophysics Data System (ADS)

Zeng, X.; Ye, M.; Wu, J.; WANG, D.

2017-12-01

Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
Fast Acquisition and Reconstruction of Optical Coherence Tomography Images via Sparse Representation

PubMed Central

Li, Shutao; McNabb, Ryan P.; Nie, Qing; Kuo, Anthony N.; Toth, Cynthia A.; Izatt, Joseph A.; Farsiu, Sina

2014-01-01

In this paper, we present a novel technique, based on compressive sensing principles, for reconstruction and enhancement of multi-dimensional image data. Our method is a major improvement and generalization of the multi-scale sparsity based tomographic denoising (MSBTD) algorithm we recently introduced for reducing speckle noise. Our new technique exhibits several advantages over MSBTD, including its capability to simultaneously reduce noise and interpolate missing data. Unlike MSBTD, our new method does not require an a priori high-quality image from the target imaging subject and thus offers the potential to shorten clinical imaging sessions. This novel image restoration method, which we termed sparsity based simultaneous denoising and interpolation (SBSDI), utilizes sparse representation dictionaries constructed from previously collected datasets. We tested the SBSDI algorithm on retinal spectral domain optical coherence tomography images captured in the clinic. Experiments showed that the SBSDI algorithm qualitatively and quantitatively outperforms other state-of-the-art methods. PMID:23846467
Self-energy-modified Poisson-Nernst-Planck equations: WKB approximation and finite-difference approaches.

PubMed

Xu, Zhenli; Ma, Manman; Liu, Pei

2014-07-01

We propose a modified Poisson-Nernst-Planck (PNP) model to investigate charge transport in electrolytes of inhomogeneous dielectric environment. The model includes the ionic polarization due to the dielectric inhomogeneity and the ion-ion correlation. This is achieved by the self energy of test ions through solving a generalized Debye-Hückel (DH) equation. We develop numerical methods for the system composed of the PNP and DH equations. Particularly, toward the numerical challenge of solving the high-dimensional DH equation, we developed an analytical WKB approximation and a numerical approach based on the selective inversion of sparse matrices. The model and numerical methods are validated by simulating the charge diffusion in electrolytes between two electrodes, for which effects of dielectrics and correlation are investigated by comparing the results with the prediction by the classical PNP theory. We find that, at the length scale of the interface separation comparable to the Bjerrum length, the results of the modified equations are significantly different from the classical PNP predictions mostly due to the dielectric effect. It is also shown that when the ion self energy is in weak or mediate strength, the WKB approximation presents a high accuracy, compared to precise finite-difference results.
Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection.

PubMed

Wang, Haoran; Yuan, Chunfeng; Hu, Weiming; Ling, Haibin; Yang, Wankou; Sun, Changyin

2014-02-01

In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors. Second, from the statistics of the context-aware descriptors, we learn action units using the graph regularized nonnegative matrix factorization, which leads to a part-based representation and encodes the geometrical information. These units effectively bridge the semantic gap in action recognition. Third, we propose a sparse model based on a joint l2,1-norm to preserve the representative items and suppress noise in the action units. Intuitively, when learning the dictionary for action representation, the sparse model captures the fact that actions from the same class share similar units. The proposed approach is evaluated on several publicly available data sets. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.

Integrative sparse principal component analysis of gene expression data.

PubMed

Liu, Mengque; Fan, Xinyan; Fang, Kuangnan; Zhang, Qingzhao; Ma, Shuangge

2017-12-01

In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance. © 2017 WILEY PERIODICALS, INC.
Towards the low-dose characterization of beam sensitive nanostructures via implementation of sparse image acquisition in scanning transmission electron microscopy

NASA Astrophysics Data System (ADS)

Hwang, Sunghwan; Han, Chang Wan; Venkatakrishnan, Singanallur V.; Bouman, Charles A.; Ortalan, Volkan

2017-04-01

Scanning transmission electron microscopy (STEM) has been successfully utilized to investigate atomic structure and chemistry of materials with atomic resolution. However, STEM’s focused electron probe with a high current density causes the electron beam damages including radiolysis and knock-on damage when the focused probe is exposed onto the electron-beam sensitive materials. Therefore, it is highly desirable to decrease the electron dose used in STEM for the investigation of biological/organic molecules, soft materials and nanomaterials in general. With the recent emergence of novel sparse signal processing theories, such as compressive sensing and model-based iterative reconstruction, possibilities of operating STEM under a sparse acquisition scheme to reduce the electron dose have been opened up. In this paper, we report our recent approach to implement a sparse acquisition in STEM mode executed by a random sparse-scan and a signal processing algorithm called model-based iterative reconstruction (MBIR). In this method, a small portion, such as 5% of randomly chosen unit sampling areas (i.e. electron probe positions), which corresponds to pixels of a STEM image, within the region of interest (ROI) of the specimen are scanned with an electron probe to obtain a sparse image. Sparse images are then reconstructed using the MBIR inpainting algorithm to produce an image of the specimen at the original resolution that is consistent with an image obtained using conventional scanning methods. Experimental results for down to 5% sampling show consistency with the full STEM image acquired by the conventional scanning method. Although, practical limitations of the conventional STEM instruments, such as internal delays of the STEM control electronics and the continuous electron gun emission, currently hinder to achieve the full potential of the sparse acquisition STEM in realizing the low dose imaging condition required for the investigation of beam-sensitive materials, the results obtained in our experiments demonstrate the sparse acquisition STEM imaging is potentially capable of reducing the electron dose by at least 20 times expanding the frontiers of our characterization capabilities for investigation of biological/organic molecules, polymers, soft materials and nanostructures in general.
Sparse approximation problem: how rapid simulated annealing succeeds and fails

NASA Astrophysics Data System (ADS)

Obuchi, Tomoyuki; Kabashima, Yoshiyuki

2016-03-01

Information processing techniques based on sparseness have been actively studied in several disciplines. Among them, a mathematical framework to approximately express a given dataset by a combination of a small number of basis vectors of an overcomplete basis is termed the sparse approximation. In this paper, we apply simulated annealing, a metaheuristic algorithm for general optimization problems, to sparse approximation in the situation where the given data have a planted sparse representation and noise is present. The result in the noiseless case shows that our simulated annealing works well in a reasonable parameter region: the planted solution is found fairly rapidly. This is true even in the case where a common relaxation of the sparse approximation problem, the G-relaxation, is ineffective. On the other hand, when the dimensionality of the data is close to the number of non-zero components, another metastable state emerges, and our algorithm fails to find the planted solution. This phenomenon is associated with a first-order phase transition. In the case of very strong noise, it is no longer meaningful to search for the planted solution. In this situation, our algorithm determines a solution with close-to-minimum distortion fairly quickly.
Modeling the impacts of dryland agricultural reclamation on groundwater resources in Northern Egypt using sparse data

NASA Astrophysics Data System (ADS)

Switzman, Harris; Coulibaly, Paulin; Adeel, Zafar

2015-01-01

Demand for freshwater in many dryland environments is exerting negative impacts on the quality and availability of groundwater resources, particularly in areas where demand is high due to irrigation or industrial water requirements to support dryland agricultural reclamation. Often however, information available to diagnose the drivers of groundwater degradation and assess management options through modeling is sparse, particularly in low and middle-income countries. This study presents an approach for generating transient groundwater model inputs to assess the long-term impacts of dryland agricultural land reclamation on groundwater resources in a highly data-sparse context. The approach was applied to the area of Wadi El Natrun in Northern Egypt, where dryland reclamation and the associated water use has been aggressive since the 1960s. Statistical distributions of water use information were constructed from a variety of sparse field and literature estimates and then combined with remote sensing data in spatio-temporal infilling model to produce the groundwater model inputs of well-pumping and surface recharge. An ensemble of groundwater model inputs were generated and used in a 3D groundwater flow (MODFLOW) of Wadi El Natrun's multi-layer aquifer system to analyze trends in water levels and water budgets over time. Validation of results against monitoring records, and model performance statistics demonstrated that despite the extremely sparse data, the approach used in this study was capable of simulating the cumulative impacts of agricultural land reclamation reasonably well. The uncertainty associated with the groundwater model itself was greater than that associated with the ensemble of well-pumping and surface recharge estimates. Water budget analysis of the groundwater model output revealed that groundwater recharge has not changed significantly over time, while pumping has. As a result of these trends, groundwater was estimated to be in a deficit of approximately 24 billion m3 (±15%) in 2011, compared to 1957. A significant trend in water level declines beginning in the 1990s that has been observed in monitoring records was evident in the model results and is directly attributed to abstraction.
Cloud-In-Cell modeling of shocked particle-laden flows at a ``SPARSE'' cost

NASA Astrophysics Data System (ADS)

Taverniers, Soren; Jacobs, Gustaaf; Sen, Oishik; Udaykumar, H. S.

2017-11-01

A common tool for enabling process-scale simulations of shocked particle-laden flows is Eulerian-Lagrangian Particle-Source-In-Cell (PSIC) modeling where each particle is traced in its Lagrangian frame and treated as a mathematical point. Its dynamics are governed by Stokes drag corrected for high Reynolds and Mach numbers. The computational burden is often reduced further through a ``Cloud-In-Cell'' (CIC) approach which amalgamates groups of physical particles into computational ``macro-particles''. CIC does not account for subgrid particle fluctuations, leading to erroneous predictions of cloud dynamics. A Subgrid Particle-Averaged Reynolds-Stress Equivalent (SPARSE) model is proposed that incorporates subgrid interphase velocity and temperature perturbations. A bivariate Gaussian source distribution, whose covariance captures the cloud's deformation to first order, accounts for the particles' momentum and energy influence on the carrier gas. SPARSE is validated by conducting tests on the interaction of a particle cloud with the accelerated flow behind a shock. The cloud's average dynamics and its deformation over time predicted with SPARSE converge to their counterparts computed with reference PSIC models as the number of Gaussians is increased from 1 to 16. This work was supported by AFOSR Grant No. FA9550-16-1-0008.
EHR-based phenotyping: Bulk learning and evaluation.

PubMed

Chiu, Po-Hsiang; Hripcsak, George

2017-06-01

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set. Copyright © 2017 Elsevier Inc. All rights reserved.
Measurement of Detonation Velocity for a Nonideal Heterogeneous Explosive in Axisymmetric and Two-Dimensional Geometries

NASA Astrophysics Data System (ADS)

Higgins, Andrew

2009-12-01

Detonation in a heterogeneous explosive with a relatively sparse concentration of reaction centers ("hot spots") is investigated experimentally. The explosive system considered is nitromethane gelled with PMMA and with glass microballoons (GMB's) in suspension. The detonation velocity is measured as a function of the characteristic charge dimension (diameter or thickness) in both axisymmetric and two-dimensional geometries. The use of a unique, annular charge geometry (with the diameter of the annulus much greater than the annular gap thickness) permits quasi-two-dimensional detonations to be observed without undesirable lateral rarefactions that result from a finite aspect ratio. The results confirm the prior findings of Gois et al. (1996) which show that, for a low concentration of GMB's, detonation propagation does not exhibit the expected 2:1 scaling from axisymmetric to planar geometries. This reinforces the idea that detonation in highly nonideal explosives is not governed exclusively by global front curvature.
Characteristics of voxel prediction power in full-brain Granger causality analysis of fMRI data

NASA Astrophysics Data System (ADS)

Garg, Rahul; Cecchi, Guillermo A.; Rao, A. Ravishankar

2011-03-01

Functional neuroimaging research is moving from the study of "activations" to the study of "interactions" among brain regions. Granger causality analysis provides a powerful technique to model spatio-temporal interactions among brain regions. We apply this technique to full-brain fMRI data without aggregating any voxel data into regions of interest (ROIs). We circumvent the problem of dimensionality using sparse regression from machine learning. On a simple finger-tapping experiment we found that (1) a small number of voxels in the brain have very high prediction power, explaining the future time course of other voxels in the brain; (2) these voxels occur in small sized clusters (of size 1-4 voxels) distributed throughout the brain; (3) albeit small, these clusters overlap with most of the clusters identified with the non-temporal General Linear Model (GLM); and (4) the method identifies clusters which, while not determined by the task and not detectable by GLM, still influence brain activity.
Relating the ac complex resistivity of the pinned vortex lattice to its shear modulus

NASA Astrophysics Data System (ADS)

Ong, N. P.; Wu, Hui

1997-07-01

We propose a way to determine the shear rigidity of the pinned vortex lattice in high-purity crystals from the dependence of its complex resistivity ρ⁁ on frequency (ω). The lattice is modeled as an elastic medium pinned by a sparse, random distribution of defects. We relate ρ⁁ to the velocity of the small subset of pinned vortices via the lattice propagator G(R,ω). Measuring ρ⁁ versus ω is equivalent to determining G(R,ω) versus R. The range of G(R,ω) depends sensitively on the shear and tilt moduli. We describe the evaluation of G(R,ω) in two-dimensional (2D) and 3D lattices. The 2D analysis provides a close fit to the frequency dependence of Reρ⁁ measured in an untwinned crystal of YBa2Cu3O7 at 89 K in a field of 0.5 and 1.0 T. We compare our results with earlier models.
Hawking radiation of five-dimensional charged black holes with scalar fields

NASA Astrophysics Data System (ADS)

Miao, Yan-Gang; Xu, Zhen-Ming

2017-09-01

We investigate the Hawking radiation cascade from the five-dimensional charged black hole with a scalar field coupled to higher-order Euler densities in a conformally invariant manner. We give the semi-analytic calculation of greybody factors for the Hawking radiation. Our analysis shows that the Hawking radiation cascade from this five-dimensional black hole is extremely sparse. The charge enhances the sparsity of the Hawking radiation, while the conformally coupled scalar field reduces this sparsity.
Geographic patterns and dynamics of Alaskan climate interpolated from a sparse station record

USGS Publications Warehouse

Fleming, Michael D.; Chapin, F. Stuart; Cramer, W.; Hufford, Gary L.; Serreze, Mark C.

2000-01-01

Data from a sparse network of climate stations in Alaska were interpolated to provide 1-km resolution maps of mean monthly temperature and precipitation-variables that are required at high spatial resolution for input into regional models of ecological processes and resource management. The interpolation model is based on thin-plate smoothing splines, which uses the spatial data along with a digital elevation model to incorporate local topography. The model provides maps that are consistent with regional climatology and with patterns recognized by experienced weather forecasters. The broad patterns of Alaskan climate are well represented and include latitudinal and altitudinal trends in temperature and precipitation and gradients in continentality. Variations within these broad patterns reflect both the weakening and reduction in frequency of low-pressure centres in their eastward movement across southern Alaska during the summer, and the shift of the storm tracks into central and northern Alaska in late summer. Not surprisingly, apparent artifacts of the interpolated climate occur primarily in regions with few or no stations. The interpolation model did not accurately represent low-level winter temperature inversions that occur within large valleys and basins. Along with well-recognized climate patterns, the model captures local topographic effects that would not be depicted using standard interpolation techniques. This suggests that similar procedures could be used to generate high-resolution maps for other high-latitude regions with a sparse density of data.
Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.

PubMed

Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver

2018-02-15

Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. The R software is available at https://github.com/angy89/RobustSparseCorrelation. aserra@unisa.it or robtag@unisa.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Finite-time scaling at the Anderson transition for vibrations in solids

NASA Astrophysics Data System (ADS)

Beltukov, Y. M.; Skipetrov, S. E.

2017-11-01

A model in which a three-dimensional elastic medium is represented by a network of identical masses connected by springs of random strengths and allowed to vibrate only along a selected axis of the reference frame exhibits an Anderson localization transition. To study this transition, we assume that the dynamical matrix of the network is given by a product of a sparse random matrix with real, independent, Gaussian-distributed nonzero entries and its transpose. A finite-time scaling analysis of the system's response to an initial excitation allows us to estimate the critical parameters of the localization transition. The critical exponent is found to be ν =1.57 ±0.02 , in agreement with previous studies of the Anderson transition belonging to the three-dimensional orthogonal universality class.
Highly parallel sparse Cholesky factorization

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Rational Variety Mapping for Contrast-Enhanced Nonlinear Unsupervised Segmentation of Multispectral Images of Unstained Specimen

PubMed Central

Kopriva, Ivica; Hadžija, Mirko; Popović Hadžija, Marijana; Korolija, Marina; Cichocki, Andrzej

2011-01-01

A methodology is proposed for nonlinear contrast-enhanced unsupervised segmentation of multispectral (color) microscopy images of principally unstained specimens. The methodology exploits spectral diversity and spatial sparseness to find anatomical differences between materials (cells, nuclei, and background) present in the image. It consists of rth-order rational variety mapping (RVM) followed by matrix/tensor factorization. Sparseness constraint implies duality between nonlinear unsupervised segmentation and multiclass pattern assignment problems. Classes not linearly separable in the original input space become separable with high probability in the higher-dimensional mapped space. Hence, RVM mapping has two advantages: it takes implicitly into account nonlinearities present in the image (ie, they are not required to be known) and it increases spectral diversity (ie, contrast) between materials, due to increased dimensionality of the mapped space. This is expected to improve performance of systems for automated classification and analysis of microscopic histopathological images. The methodology was validated using RVM of the second and third orders of the experimental multispectral microscopy images of unstained sciatic nerve fibers (nervus ischiadicus) and of unstained white pulp in the spleen tissue, compared with a manually defined ground truth labeled by two trained pathophysiologists. The methodology can also be useful for additional contrast enhancement of images of stained specimens. PMID:21708116
Concurrent Tumor Segmentation and Registration with Uncertainty-based Sparse non-Uniform Graphs

PubMed Central

Parisot, Sarah; Wells, William; Chemouny, Stéphane; Duffau, Hugues; Paragios, Nikos

2014-01-01

In this paper, we present a graph-based concurrent brain tumor segmentation and atlas to diseased patient registration framework. Both segmentation and registration problems are modeled using a unified pairwise discrete Markov Random Field model on a sparse grid superimposed to the image domain. Segmentation is addressed based on pattern classification techniques, while registration is performed by maximizing the similarity between volumes and is modular with respect to the matching criterion. The two problems are coupled by relaxing the registration term in the tumor area, corresponding to areas of high classification score and high dissimilarity between volumes. In order to overcome the main shortcomings of discrete approaches regarding appropriate sampling of the solution space as well as important memory requirements, content driven samplings of the discrete displacement set and the sparse grid are considered, based on the local segmentation and registration uncertainties recovered by the min marginal energies. State of the art results on a substantial low-grade glioma database demonstrate the potential of our method, while our proposed approach shows maintained performance and strongly reduced complexity of the model. PMID:24717540
A modified dual-level algorithm for large-scale three-dimensional Laplace and Helmholtz equation

NASA Astrophysics Data System (ADS)

Li, Junpu; Chen, Wen; Fu, Zhuojia

2018-01-01

A modified dual-level algorithm is proposed in the article. By the help of the dual level structure, the fully-populated interpolation matrix on the fine level is transformed to a local supported sparse matrix to solve the highly ill-conditioning and excessive storage requirement resulting from fully-populated interpolation matrix. The kernel-independent fast multipole method is adopted to expediting the solving process of the linear equations on the coarse level. Numerical experiments up to 2-million fine-level nodes have successfully been achieved. It is noted that the proposed algorithm merely needs to place 2-3 coarse-level nodes in each wavelength per direction to obtain the reasonable solution, which almost down to the minimum requirement allowed by the Shannon's sampling theorem. In the real human head model example, it is observed that the proposed algorithm can simulate well computationally very challenging exterior high-frequency harmonic acoustic wave propagation up to 20,000 Hz.
Self-organized Evaluation of Dynamic Hand Gestures for Sign Language Recognition

NASA Astrophysics Data System (ADS)

Buciu, Ioan; Pitas, Ioannis

Two main theories exist with respect to face encoding and representation in the human visual system (HVS). The first one refers to the dense (holistic) representation of the face, where faces have "holon"-like appearance. The second one claims that a more appropriate face representation is given by a sparse code, where only a small fraction of the neural cells corresponding to face encoding is activated. Theoretical and experimental evidence suggest that the HVS performs face analysis (encoding, storing, face recognition, facial expression recognition) in a structured and hierarchical way, where both representations have their own contribution and goal. According to neuropsychological experiments, it seems that encoding for face recognition, relies on holistic image representation, while a sparse image representation is used for facial expression analysis and classification. From the computer vision perspective, the techniques developed for automatic face and facial expression recognition fall into the same two representation types. Like in Neuroscience, the techniques which perform better for face recognition yield a holistic image representation, while those techniques suitable for facial expression recognition use a sparse or local image representation. The proposed mathematical models of image formation and encoding try to simulate the efficient storing, organization and coding of data in the human cortex. This is equivalent with embedding constraints in the model design regarding dimensionality reduction, redundant information minimization, mutual information minimization, non-negativity constraints, class information, etc. The presented techniques are applied as a feature extraction step followed by a classification method, which also heavily influences the recognition results.
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction

PubMed Central

Huang, Li

2017-01-01

Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. PMID:29253885
Fast integration-based prediction bands for ordinary differential equation models.

PubMed

Hass, Helge; Kreutz, Clemens; Timmer, Jens; Kaschek, Daniel

2016-04-15

To gain a deeper understanding of biological processes and their relevance in disease, mathematical models are built upon experimental data. Uncertainty in the data leads to uncertainties of the model's parameters and in turn to uncertainties of predictions. Mechanistic dynamic models of biochemical networks are frequently based on nonlinear differential equation systems and feature a large number of parameters, sparse observations of the model components and lack of information in the available data. Due to the curse of dimensionality, classical and sampling approaches propagating parameter uncertainties to predictions are hardly feasible and insufficient. However, for experimental design and to discriminate between competing models, prediction and confidence bands are essential. To circumvent the hurdles of the former methods, an approach to calculate a profile likelihood on arbitrary observations for a specific time point has been introduced, which provides accurate confidence and prediction intervals for nonlinear models and is computationally feasible for high-dimensional models. In this article, reliable and smooth point-wise prediction and confidence bands to assess the model's uncertainty on the whole time-course are achieved via explicit integration with elaborate correction mechanisms. The corresponding system of ordinary differential equations is derived and tested on three established models for cellular signalling. An efficiency analysis is performed to illustrate the computational benefit compared with repeated profile likelihood calculations at multiple time points. The integration framework and the examples used in this article are provided with the software package Data2Dynamics, which is based on MATLAB and freely available at http://www.data2dynamics.org helge.hass@fdm.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mapping eQTL Networks with Mixed Graphical Markov Models

PubMed Central

Tur, Inma; Roverato, Alberto; Castelo, Robert

2014-01-01

Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular, and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this article we approach this challenge with mixed graphical Markov models, higher-order conditional independences, and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene–gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes. PMID:25271303
Improving Low-Dose Blood-Brain Barrier Permeability Quantification Using Sparse High-Dose Induced Prior for Patlak Model

PubMed Central

Fang, Ruogu; Karlsson, Kolbeinn; Chen, Tsuhan; Sanelli, Pina C.

2014-01-01

Blood-brain-barrier permeability (BBBP) measurements extracted from the perfusion computed tomography (PCT) using the Patlak model can be a valuable indicator to predict hemorrhagic transformation in patients with acute stroke. Unfortunately, the standard Patlak model based PCT requires excessive radiation exposure, which raised attention on radiation safety. Minimizing radiation dose is of high value in clinical practice but can degrade the image quality due to the introduced severe noise. The purpose of this work is to construct high quality BBBP maps from low-dose PCT data by using the brain structural similarity between different individuals and the relations between the high- and low-dose maps. The proposed sparse high-dose induced (shd-Patlak) model performs by building a high-dose induced prior for the Patlak model with a set of location adaptive dictionaries, followed by an optimized estimation of BBBP map with the prior regularized Patlak model. Evaluation with the simulated low-dose clinical brain PCT datasets clearly demonstrate that the shd-Patlak model can achieve more significant gains than the standard Patlak model with improved visual quality, higher fidelity to the gold standard and more accurate details for clinical analysis. PMID:24200529
Vision based obstacle detection and grouping for helicopter guidance

NASA Technical Reports Server (NTRS)

Sridhar, Banavar; Chatterji, Gano

1993-01-01

Electro-optical sensors can be used to compute range to objects in the flight path of a helicopter. The computation is based on the optical flow/motion at different points in the image. The motion algorithms provide a sparse set of ranges to discrete features in the image sequence as a function of azimuth and elevation. For obstacle avoidance guidance and display purposes, these discrete set of ranges, varying from a few hundreds to several thousands, need to be grouped into sets which correspond to objects in the real world. This paper presents a new method for object segmentation based on clustering the sparse range information provided by motion algorithms together with the spatial relation provided by the static image. The range values are initially grouped into clusters based on depth. Subsequently, the clusters are modified by using the K-means algorithm in the inertial horizontal plane and the minimum spanning tree algorithms in the image plane. The object grouping allows interpolation within a group and enables the creation of dense range maps. Researchers in robotics have used densely scanned sequence of laser range images to build three-dimensional representation of the outside world. Thus, modeling techniques developed for dense range images can be extended to sparse range images. The paper presents object segmentation results for a sequence of flight images.
A comparison between modeled and measured permafrost temperatures at Ritigraben borehole, Switzerland

NASA Astrophysics Data System (ADS)

Mitterer-Hoinkes, Susanna; Lehning, Michael; Phillips, Marcia; Sailer, Rudolf

2013-04-01

The area-wide distribution of permafrost is sparsely known in mountainous terrain (e.g. Alps). Permafrost monitoring can only be based on point or small scale measurements such as boreholes, active rock glaciers, BTS measurements or geophysical measurements. To get a better understanding of permafrost distribution, it is necessary to focus on modeling permafrost temperatures and permafrost distribution patterns. A lot of effort on these topics has been already expended using different kinds of models. In this study, the evolution of subsurface temperatures over successive years has been modeled at the location Ritigraben borehole (Mattertal, Switzerland) by using the one-dimensional snow cover model SNOWPACK. The model needs meteorological input and in our case information on subsurface properties. We used meteorological input variables of the automatic weather station Ritigraben (2630 m) in combination with the automatic weather station Saas Seetal (2480 m). Meteorological data between 2006 and 2011 on an hourly basis were used to drive the model. As former studies showed, the snow amount and the snow cover duration have a great influence on the thermal regime. Low snow heights allow for deeper penetration of low winter temperatures into the ground, strong winters with a high amount of snow attenuate this effect. In addition, variations in subsurface conditions highly influence the temperature regime. Therefore, we conducted sensitivity runs by defining a series of different subsurface properties. The modeled subsurface temperature profiles of Ritigraben were then compared to the measured temperatures in the Ritigraben borehole. This allows a validation of the influence of subsurface properties on the temperature regime. As expected, the influence of the snow cover is stronger than the influence of sub-surface material properties, which are significant, however. The validation presented here serves to prepare a larger spatial simulation with the complex hydro-meteorological 3-dimensional model Alpine 3D, which is based on a distributed application of SNOWPACK.
A 3D finite-difference BiCG iterative solver with the Fourier-Jacobi preconditioner for the anisotropic EIT/EEG forward problem.

PubMed

Turovets, Sergei; Volkov, Vasily; Zherdetsky, Aleksej; Prakonina, Alena; Malony, Allen D

2014-01-01

The Electrical Impedance Tomography (EIT) and electroencephalography (EEG) forward problems in anisotropic inhomogeneous media like the human head belongs to the class of the three-dimensional boundary value problems for elliptic equations with mixed derivatives. We introduce and explore the performance of several new promising numerical techniques, which seem to be more suitable for solving these problems. The proposed numerical schemes combine the fictitious domain approach together with the finite-difference method and the optimally preconditioned Conjugate Gradient- (CG-) type iterative method for treatment of the discrete model. The numerical scheme includes the standard operations of summation and multiplication of sparse matrices and vector, as well as FFT, making it easy to implement and eligible for the effective parallel implementation. Some typical use cases for the EIT/EEG problems are considered demonstrating high efficiency of the proposed numerical technique.
Parameter Estimation and Model Selection for Indoor Environments Based on Sparse Observations

NASA Astrophysics Data System (ADS)

Dehbi, Y.; Loch-Dehbi, S.; Plümer, L.

2017-09-01

This paper presents a novel method for the parameter estimation and model selection for the reconstruction of indoor environments based on sparse observations. While most approaches for the reconstruction of indoor models rely on dense observations, we predict scenes of the interior with high accuracy in the absence of indoor measurements. We use a model-based top-down approach and incorporate strong but profound prior knowledge. The latter includes probability density functions for model parameters and sparse observations such as room areas and the building footprint. The floorplan model is characterized by linear and bi-linear relations with discrete and continuous parameters. We focus on the stochastic estimation of model parameters based on a topological model derived by combinatorial reasoning in a first step. A Gauss-Markov model is applied for estimation and simulation of the model parameters. Symmetries are represented and exploited during the estimation process. Background knowledge as well as observations are incorporated in a maximum likelihood estimation and model selection is performed with AIC/BIC. The likelihood is also used for the detection and correction of potential errors in the topological model. Estimation results are presented and discussed.
German National Proficiency Scales in Biology: Internal Structure, Relations to General Cognitive Abilities and Verbal Skills

PubMed Central

KÖLLER, OLAF

2016-01-01

ABSTRACT National and international large‐scale assessments (LSA) have a major impact on educational systems, which raises fundamental questions about the validity of the measures regarding their internal structure and their relations to relevant covariates. Given its importance, research on the validity of instruments specifically developed for LSA is still sparse, especially in science and its subdomains biology, chemistry, and physics. However, policy decisions for the improvement of educational quality based on LSA can only be helpful if valid information on students’ achievement levels is provided. In the present study, the nature of the measurement instruments based on the German Educational Standards in Biology is examined. On the basis of data from 3,165 students in Grade 10, we present dimensional analyses and report the relationship between different subdimensions of biology literacy and cognitive covariates such as general cognitive abilities and verbal skills. A theory‐driven two‐dimensional model fitted the data best. Content knowledge and scientific inquiry, two subdimensions of biology literacy, are highly correlated and show differential correlational patterns to the covariates. We argue that the underlying structure of biology should be incorporated into curricula, teacher training and future assessments. PMID:27818532
German National Proficiency Scales in Biology: Internal Structure, Relations to General Cognitive Abilities and Verbal Skills.

PubMed

Kampa, Nele; Köller, Olaf

2016-09-01

National and international large-scale assessments (LSA) have a major impact on educational systems, which raises fundamental questions about the validity of the measures regarding their internal structure and their relations to relevant covariates. Given its importance, research on the validity of instruments specifically developed for LSA is still sparse, especially in science and its subdomains biology, chemistry, and physics. However, policy decisions for the improvement of educational quality based on LSA can only be helpful if valid information on students' achievement levels is provided. In the present study, the nature of the measurement instruments based on the German Educational Standards in Biology is examined. On the basis of data from 3,165 students in Grade 10, we present dimensional analyses and report the relationship between different subdimensions of biology literacy and cognitive covariates such as general cognitive abilities and verbal skills. A theory-driven two-dimensional model fitted the data best. Content knowledge and scientific inquiry, two subdimensions of biology literacy, are highly correlated and show differential correlational patterns to the covariates. We argue that the underlying structure of biology should be incorporated into curricula, teacher training and future assessments.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

PubMed Central

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
A Subspace Pursuit–based Iterative Greedy Hierarchical Solution to the Neuromagnetic Inverse Problem

PubMed Central

Babadi, Behtash; Obregon-Henao, Gabriel; Lamus, Camilo; Hämäläinen, Matti S.; Brown, Emery N.; Purdon, Patrick L.

2013-01-01

Magnetoencephalography (MEG) is an important non-invasive method for studying activity within the human brain. Source localization methods can be used to estimate spatiotemporal activity from MEG measurements with high temporal resolution, but the spatial resolution of these estimates is poor due to the ill-posed nature of the MEG inverse problem. Recent developments in source localization methodology have emphasized temporal as well as spatial constraints to improve source localization accuracy, but these methods can be computationally intense. Solutions emphasizing spatial sparsity hold tremendous promise, since the underlying neurophysiological processes generating MEG signals are often sparse in nature, whether in the form of focal sources, or distributed sources representing large-scale functional networks. Recent developments in the theory of compressed sensing (CS) provide a rigorous framework to estimate signals with sparse structure. In particular, a class of CS algorithms referred to as greedy pursuit algorithms can provide both high recovery accuracy and low computational complexity. Greedy pursuit algorithms are difficult to apply directly to the MEG inverse problem because of the high-dimensional structure of the MEG source space and the high spatial correlation in MEG measurements. In this paper, we develop a novel greedy pursuit algorithm for sparse MEG source localization that overcomes these fundamental problems. This algorithm, which we refer to as the Subspace Pursuit-based Iterative Greedy Hierarchical (SPIGH) inverse solution, exhibits very low computational complexity while achieving very high localization accuracy. We evaluate the performance of the proposed algorithm using comprehensive simulations, as well as the analysis of human MEG data during spontaneous brain activity and somatosensory stimuli. These studies reveal substantial performance gains provided by the SPIGH algorithm in terms of computational complexity, localization accuracy, and robustness. PMID:24055554
Feature extraction based on extended multi-attribute profiles and sparse autoencoder for remote sensing image classification

NASA Astrophysics Data System (ADS)

Teffahi, Hanane; Yao, Hongxun; Belabid, Nasreddine; Chaib, Souleyman

2018-02-01

The satellite images with very high spatial resolution have been recently widely used in image classification topic as it has become challenging task in remote sensing field. Due to a number of limitations such as the redundancy of features and the high dimensionality of the data, different classification methods have been proposed for remote sensing images classification particularly the methods using feature extraction techniques. This paper propose a simple efficient method exploiting the capability of extended multi-attribute profiles (EMAP) with sparse autoencoder (SAE) for remote sensing image classification. The proposed method is used to classify various remote sensing datasets including hyperspectral and multispectral images by extracting spatial and spectral features based on the combination of EMAP and SAE by linking them to kernel support vector machine (SVM) for classification. Experiments on new hyperspectral image "Huston data" and multispectral image "Washington DC data" shows that this new scheme can achieve better performance of feature learning than the primitive features, traditional classifiers and ordinary autoencoder and has huge potential to achieve higher accuracy for classification in short running time.
The extraction of spot signal in Shack-Hartmann wavefront sensor based on sparse representation

NASA Astrophysics Data System (ADS)

Zhang, Yanyan; Xu, Wentao; Chen, Suting; Ge, Junxiang; Wan, Fayu

2016-07-01

Several techniques have been used with Shack-Hartmann wavefront sensors to determine the local wave-front gradient across each lenslet. While the centroid error of Shack-Hartmann wavefront sensor is relatively large since the skylight background and the detector noise. In this paper, we introduce a new method based on sparse representation to extract the target signal from the background and the noise. First, an over complete dictionary of the spot signal is constructed based on two-dimensional Gaussian model. Then the Shack-Hartmann image is divided into sub blocks. The corresponding coefficients of each block is computed in the over complete dictionary. Since the coefficients of the noise and the target are large different, then extract the target by setting a threshold to the coefficients. Experimental results show that the target can be well extracted and the deviation, RMS and PV of the centroid are all smaller than the method of subtracting threshold.
Group sparse multiview patch alignment framework with view consistency for image classification.

PubMed

Gui, Jie; Tao, Dacheng; Sun, Zhenan; Luo, Yong; You, Xinge; Tang, Yuan Yan

2014-07-01

No single feature can satisfactorily characterize the semantic concepts of an image. Multiview learning aims to unify different kinds of features to produce a consensual and efficient representation. This paper redefines part optimization in the patch alignment framework (PAF) and develops a group sparse multiview patch alignment framework (GSM-PAF). The new part optimization considers not only the complementary properties of different views, but also view consistency. In particular, view consistency models the correlations between all possible combinations of any two kinds of view. In contrast to conventional dimensionality reduction algorithms that perform feature extraction and feature selection independently, GSM-PAF enjoys joint feature extraction and feature selection by exploiting l(2,1)-norm on the projection matrix to achieve row sparsity, which leads to the simultaneous selection of relevant features and learning transformation, and thus makes the algorithm more discriminative. Experiments on two real-world image data sets demonstrate the effectiveness of GSM-PAF for image classification.
Three dimensional iterative beam propagation method for optical waveguide devices

NASA Astrophysics Data System (ADS)

Ma, Changbao; Van Keuren, Edward

2006-10-01

The finite difference beam propagation method (FD-BPM) is an effective model for simulating a wide range of optical waveguide structures. The classical FD-BPMs are based on the Crank-Nicholson scheme, and in tridiagonal form can be solved using the Thomas method. We present a different type of algorithm for 3-D structures. In this algorithm, the wave equation is formulated into a large sparse matrix equation which can be solved using iterative methods. The simulation window shifting scheme and threshold technique introduced in our earlier work are utilized to overcome the convergence problem of iterative methods for large sparse matrix equation and wide-angle simulations. This method enables us to develop higher-order 3-D wide-angle (WA-) BPMs based on Pade approximant operators and the multistep method, which are commonly used in WA-BPMs for 2-D structures. Simulations using the new methods will be compared to the analytical results to assure its effectiveness and applicability.
Model methodology for estimating pesticide concentration extremes based on sparse monitoring data

USGS Publications Warehouse

Vecchia, Aldo V.

2018-03-22

This report describes a new methodology for using sparse (weekly or less frequent observations) and potentially highly censored pesticide monitoring data to simulate daily pesticide concentrations and associated quantities used for acute and chronic exposure assessments, such as the annual maximum daily concentration. The new methodology is based on a statistical model that expresses log-transformed daily pesticide concentration in terms of a seasonal wave, flow-related variability, long-term trend, and serially correlated errors. Methods are described for estimating the model parameters, generating conditional simulations of daily pesticide concentration given sparse (weekly or less frequent) and potentially highly censored observations, and estimating concentration extremes based on the conditional simulations. The model can be applied to datasets with as few as 3 years of record, as few as 30 total observations, and as few as 10 uncensored observations. The model was applied to atrazine, carbaryl, chlorpyrifos, and fipronil data for U.S. Geological Survey pesticide sampling sites with sufficient data for applying the model. A total of 112 sites were analyzed for atrazine, 38 for carbaryl, 34 for chlorpyrifos, and 33 for fipronil. The results are summarized in this report; and, R functions, described in this report and provided in an accompanying model archive, can be used to fit the model parameters and generate conditional simulations of daily concentrations for use in investigations involving pesticide exposure risk and uncertainty.
A highly efficient approach to protein interactome mapping based on collaborative filtering framework.

PubMed

Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng

2015-01-09

The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
A Highly Efficient Approach to Protein Interactome Mapping Based on Collaborative Filtering Framework

PubMed Central

Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng

2015-01-01

The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly. PMID:25572661
A Highly Efficient Approach to Protein Interactome Mapping Based on Collaborative Filtering Framework

NASA Astrophysics Data System (ADS)

Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng

2015-01-01

The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
On the development of efficient algorithms for three dimensional fluid flow

NASA Technical Reports Server (NTRS)

Maccormack, R. W.

1988-01-01

The difficulties of constructing efficient algorithms for three-dimensional flow are discussed. Reasonable candidates are analyzed and tested, and most are found to have obvious shortcomings. Yet, there is promise that an efficient class of algorithms exist between the severely time-step sized-limited explicit or approximately factored algorithms and the computationally intensive direct inversion of large sparse matrices by Gaussian elimination.
Spatial Bayesian latent factor regression modeling of coordinate-based meta-analysis data.

PubMed

Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D; Nichols, Thomas E

2018-03-01

Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the article are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to (i) identify areas of consistent activation; and (ii) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterized as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. © 2017, The International Biometric Society.

Sparse coding can predict primary visual cortex receptive field changes induced by abnormal visual input.

PubMed

Hunt, Jonathan J; Dayan, Peter; Goodhill, Geoffrey J

2013-01-01

Receptive fields acquired through unsupervised learning of sparse representations of natural scenes have similar properties to primary visual cortex (V1) simple cell receptive fields. However, what drives in vivo development of receptive fields remains controversial. The strongest evidence for the importance of sensory experience in visual development comes from receptive field changes in animals reared with abnormal visual input. However, most sparse coding accounts have considered only normal visual input and the development of monocular receptive fields. Here, we applied three sparse coding models to binocular receptive field development across six abnormal rearing conditions. In every condition, the changes in receptive field properties previously observed experimentally were matched to a similar and highly faithful degree by all the models, suggesting that early sensory development can indeed be understood in terms of an impetus towards sparsity. As previously predicted in the literature, we found that asymmetries in inter-ocular correlation across orientations lead to orientation-specific binocular receptive fields. Finally we used our models to design a novel stimulus that, if present during rearing, is predicted by the sparsity principle to lead robustly to radically abnormal receptive fields.
Sparse Coding Can Predict Primary Visual Cortex Receptive Field Changes Induced by Abnormal Visual Input

PubMed Central

Hunt, Jonathan J.; Dayan, Peter; Goodhill, Geoffrey J.

2013-01-01

Receptive fields acquired through unsupervised learning of sparse representations of natural scenes have similar properties to primary visual cortex (V1) simple cell receptive fields. However, what drives in vivo development of receptive fields remains controversial. The strongest evidence for the importance of sensory experience in visual development comes from receptive field changes in animals reared with abnormal visual input. However, most sparse coding accounts have considered only normal visual input and the development of monocular receptive fields. Here, we applied three sparse coding models to binocular receptive field development across six abnormal rearing conditions. In every condition, the changes in receptive field properties previously observed experimentally were matched to a similar and highly faithful degree by all the models, suggesting that early sensory development can indeed be understood in terms of an impetus towards sparsity. As previously predicted in the literature, we found that asymmetries in inter-ocular correlation across orientations lead to orientation-specific binocular receptive fields. Finally we used our models to design a novel stimulus that, if present during rearing, is predicted by the sparsity principle to lead robustly to radically abnormal receptive fields. PMID:23675290
Very high order discontinuous Galerkin method in elliptic problems

NASA Astrophysics Data System (ADS)

Jaśkowiec, Jan

2017-09-01

The paper deals with high-order discontinuous Galerkin (DG) method with the approximation order that exceeds 20 and reaches 100 and even 1000 with respect to one-dimensional case. To achieve such a high order solution, the DG method with finite difference method has to be applied. The basis functions of this method are high-order orthogonal Legendre or Chebyshev polynomials. These polynomials are defined in one-dimensional space (1D), but they can be easily adapted to two-dimensional space (2D) by cross products. There are no nodes in the elements and the degrees of freedom are coefficients of linear combination of basis functions. In this sort of analysis the reference elements are needed, so the transformations of the reference element into the real one are needed as well as the transformations connected with the mesh skeleton. Due to orthogonality of the basis functions, the obtained matrices are sparse even for finite elements with more than thousands degrees of freedom. In consequence, the truncation errors are limited and very high-order analysis can be performed. The paper is illustrated with a set of benchmark examples of 1D and 2D for the elliptic problems. The example presents the great effectiveness of the method that can shorten the length of calculation over hundreds times.
Very high order discontinuous Galerkin method in elliptic problems

NASA Astrophysics Data System (ADS)

Jaśkowiec, Jan

2018-07-01

The paper deals with high-order discontinuous Galerkin (DG) method with the approximation order that exceeds 20 and reaches 100 and even 1000 with respect to one-dimensional case. To achieve such a high order solution, the DG method with finite difference method has to be applied. The basis functions of this method are high-order orthogonal Legendre or Chebyshev polynomials. These polynomials are defined in one-dimensional space (1D), but they can be easily adapted to two-dimensional space (2D) by cross products. There are no nodes in the elements and the degrees of freedom are coefficients of linear combination of basis functions. In this sort of analysis the reference elements are needed, so the transformations of the reference element into the real one are needed as well as the transformations connected with the mesh skeleton. Due to orthogonality of the basis functions, the obtained matrices are sparse even for finite elements with more than thousands degrees of freedom. In consequence, the truncation errors are limited and very high-order analysis can be performed. The paper is illustrated with a set of benchmark examples of 1D and 2D for the elliptic problems. The example presents the great effectiveness of the method that can shorten the length of calculation over hundreds times.
Three-Dimensional Inverse Transport Solver Based on Compressive Sensing Technique

NASA Astrophysics Data System (ADS)

Cheng, Yuxiong; Wu, Hongchun; Cao, Liangzhi; Zheng, Youqi

2013-09-01

According to the direct exposure measurements from flash radiographic image, a compressive sensing-based method for three-dimensional inverse transport problem is presented. The linear absorption coefficients and interface locations of objects are reconstructed directly at the same time. It is always very expensive to obtain enough measurements. With limited measurements, compressive sensing sparse reconstruction technique orthogonal matching pursuit is applied to obtain the sparse coefficients by solving an optimization problem. A three-dimensional inverse transport solver is developed based on a compressive sensing-based technique. There are three features in this solver: (1) AutoCAD is employed as a geometry preprocessor due to its powerful capacity in graphic. (2) The forward projection matrix rather than Gauss matrix is constructed by the visualization tool generator. (3) Fourier transform and Daubechies wavelet transform are adopted to convert an underdetermined system to a well-posed system in the algorithm. Simulations are performed and numerical results in pseudo-sine absorption problem, two-cube problem and two-cylinder problem when using compressive sensing-based solver agree well with the reference value.
Robust Single Image Super-Resolution via Deep Networks With Sparse Prior.

PubMed

Liu, Ding; Wang, Zhaowen; Wen, Bihan; Yang, Jianchao; Han, Wei; Huang, Thomas S

2016-07-01

Single image super-resolution (SR) is an ill-posed problem, which tries to recover a high-resolution image from its low-resolution observation. To regularize the solution of the problem, previous methods have focused on designing good priors for natural images, such as sparse representation, or directly learning the priors from a large data set with models, such as deep neural networks. In this paper, we argue that domain expertise from the conventional sparse coding model can be combined with the key ingredients of deep learning to achieve further improved results. We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data. The network has a cascaded structure, which boosts the SR performance for both fixed and incremental scaling factors. The proposed training and testing schemes can be extended for robust handling of images with additional degradation, such as noise and blurring. A subjective assessment is conducted and analyzed in order to thoroughly evaluate various SR techniques. Our proposed model is tested on a wide range of images, and it significantly outperforms the existing state-of-the-art methods for various scaling factors both quantitatively and perceptually.
Inverse modeling of hydraulic tests in fractured crystalline rock based on a transition probability geostatistical approach

NASA Astrophysics Data System (ADS)

Blessent, Daniela; Therrien, René; Lemieux, Jean-Michel

2011-12-01

This paper presents numerical simulations of a series of hydraulic interference tests conducted in crystalline bedrock at Olkiluoto (Finland), a potential site for the disposal of the Finnish high-level nuclear waste. The tests are in a block of crystalline bedrock of about 0.03 km3 that contains low-transmissivity fractures. Fracture density, orientation, and fracture transmissivity are estimated from Posiva Flow Log (PFL) measurements in boreholes drilled in the rock block. On the basis of those data, a geostatistical approach relying on a transitional probability and Markov chain models is used to define a conceptual model based on stochastic fractured rock facies. Four facies are defined, from sparsely fractured bedrock to highly fractured bedrock. Using this conceptual model, three-dimensional groundwater flow is then simulated to reproduce interference pumping tests in either open or packed-off boreholes. Hydraulic conductivities of the fracture facies are estimated through automatic calibration using either hydraulic heads or both hydraulic heads and PFL flow rates as targets for calibration. The latter option produces a narrower confidence interval for the calibrated hydraulic conductivities, therefore reducing the associated uncertainty and demonstrating the usefulness of the measured PFL flow rates. Furthermore, the stochastic facies conceptual model is a suitable alternative to discrete fracture network models to simulate fluid flow in fractured geological media.
Sparse Covariance Matrix Estimation With Eigenvalue Constraints

PubMed Central

LIU, Han; WANG, Lie; ZHAO, Tuo

2014-01-01

We propose a new approach for estimating high-dimensional, positive-definite covariance matrices. Our method extends the generalized thresholding operator by adding an explicit eigenvalue constraint. The estimated covariance matrix simultaneously achieves sparsity and positive definiteness. The estimator is rate optimal in the minimax sense and we develop an efficient iterative soft-thresholding and projection algorithm based on the alternating direction method of multipliers. Empirically, we conduct thorough numerical experiments on simulated datasets as well as real data examples to illustrate the usefulness of our method. Supplementary materials for the article are available online. PMID:25620866
Multimodal sparse reconstruction in guided wave imaging of defects in plates

NASA Astrophysics Data System (ADS)

Golato, Andrew; Santhanam, Sridhar; Ahmad, Fauzia; Amin, Moeness G.

2016-07-01

A multimodal sparse reconstruction approach is proposed for localizing defects in thin plates in Lamb wave-based structural health monitoring. The proposed approach exploits both the sparsity of the defects and the multimodal nature of Lamb wave propagation in plates. It takes into account the variation of the defects' aspect angles across the various transducer pairs. At low operating frequencies, only the fundamental symmetric and antisymmetric Lamb modes emanate from a transmitting transducer. Asymmetric defects scatter these modes and spawn additional converted fundamental modes. Propagation models are developed for each of these scattered and spawned modes arriving at the various receiving transducers. This enables the construction of modal dictionary matrices spanning a two-dimensional array of pixels representing potential defect locations in the region of interest. Reconstruction of the region of interest is achieved by inverting the resulting linear model using the group sparsity constraint, where the groups extend across the various transducer pairs and the different modes. The effectiveness of the proposed approach is established with finite-element scattering simulations of the fundamental Lamb wave modes by crack-like defects in a plate. The approach is subsequently validated with experimental results obtained from an aluminum plate with asymmetric defects.
A robust sparse-modeling framework for estimating schizophrenia biomarkers from fMRI.

PubMed

Dillon, Keith; Calhoun, Vince; Wang, Yu-Ping

2017-01-30

Our goal is to identify the brain regions most relevant to mental illness using neuroimaging. State of the art machine learning methods commonly suffer from repeatability difficulties in this application, particularly when using large and heterogeneous populations for samples. We revisit both dimensionality reduction and sparse modeling, and recast them in a common optimization-based framework. This allows us to combine the benefits of both types of methods in an approach which we call unambiguous components. We use this to estimate the image component with a constrained variability, which is best correlated with the unknown disease mechanism. We apply the method to the estimation of neuroimaging biomarkers for schizophrenia, using task fMRI data from a large multi-site study. The proposed approach yields an improvement in both robustness of the estimate and classification accuracy. We find that unambiguous components incorporate roughly two thirds of the same brain regions as sparsity-based methods LASSO and elastic net, while roughly one third of the selected regions differ. Further, unambiguous components achieve superior classification accuracy in differentiating cases from controls. Unambiguous components provide a robust way to estimate important regions of imaging data. Copyright © 2016 Elsevier B.V. All rights reserved.
An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies.

PubMed

Dai, Hongying; Wu, Guodong; Wu, Michael; Zhi, Degui

2016-01-01

Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker-single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency,[Formula: see text], compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals ([Formula: see text]). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.
Sparse deconvolution for the large-scale ill-posed inverse problem of impact force reconstruction

NASA Astrophysics Data System (ADS)

Qiao, Baijie; Zhang, Xingwu; Gao, Jiawei; Liu, Ruonan; Chen, Xuefeng

2017-01-01

Most previous regularization methods for solving the inverse problem of force reconstruction are to minimize the l2-norm of the desired force. However, these traditional regularization methods such as Tikhonov regularization and truncated singular value decomposition, commonly fail to solve the large-scale ill-posed inverse problem in moderate computational cost. In this paper, taking into account the sparse characteristic of impact force, the idea of sparse deconvolution is first introduced to the field of impact force reconstruction and a general sparse deconvolution model of impact force is constructed. Second, a novel impact force reconstruction method based on the primal-dual interior point method (PDIPM) is proposed to solve such a large-scale sparse deconvolution model, where minimizing the l2-norm is replaced by minimizing the l1-norm. Meanwhile, the preconditioned conjugate gradient algorithm is used to compute the search direction of PDIPM with high computational efficiency. Finally, two experiments including the small-scale or medium-scale single impact force reconstruction and the relatively large-scale consecutive impact force reconstruction are conducted on a composite wind turbine blade and a shell structure to illustrate the advantage of PDIPM. Compared with Tikhonov regularization, PDIPM is more efficient, accurate and robust whether in the single impact force reconstruction or in the consecutive impact force reconstruction.
Fire adaptation in Neblinaria celiae (Theaceae), a high-elevation rosette shrub endemic to a wet equatorial tepui

USGS Publications Warehouse

Givnish, T.J.; McDiarmid, R.W.; Buck, W.R.

1986-01-01

Neblinaria celiae (Theaceae), a rosette shrub endemic to the exceedingly rainy summit of remote Cerro de la Neblina in southern Venezuela, has a previously undescribed set of adaptations to fire. Its growth form entails sparse branching, massive terminal leaf rosettes, and thick bark. It is highly fire-tolerant, with a survival rate of 93% in a stand recently ignited by lightning, vs. 0% in seven co-occurring woody species. Survival increases sharply with rosette height, favoring a sparsely branched habit that would maximize the rate of upward growth through the sparse fuel layer supported by a sterile substrate. Thick bark and massive rosettes help protect cambial and foliar meristems from brief exposure to high temperatures. Rosettes on shorter plants are exposed to greater damage from fire near the ground and, as expected, are bigger and impound more rainwater; the greater number of leaves nearly balances the greater leaf mortality caused by fire. We relate Neblinaria's growth form to its dominance atop Neblina, to a general model for the evolution of sparse branching, and to the evolution of growth form in other tepui plants.
Concurrent tumor segmentation and registration with uncertainty-based sparse non-uniform graphs.

PubMed

Parisot, Sarah; Wells, William; Chemouny, Stéphane; Duffau, Hugues; Paragios, Nikos

2014-05-01

In this paper, we present a graph-based concurrent brain tumor segmentation and atlas to diseased patient registration framework. Both segmentation and registration problems are modeled using a unified pairwise discrete Markov Random Field model on a sparse grid superimposed to the image domain. Segmentation is addressed based on pattern classification techniques, while registration is performed by maximizing the similarity between volumes and is modular with respect to the matching criterion. The two problems are coupled by relaxing the registration term in the tumor area, corresponding to areas of high classification score and high dissimilarity between volumes. In order to overcome the main shortcomings of discrete approaches regarding appropriate sampling of the solution space as well as important memory requirements, content driven samplings of the discrete displacement set and the sparse grid are considered, based on the local segmentation and registration uncertainties recovered by the min marginal energies. State of the art results on a substantial low-grade glioma database demonstrate the potential of our method, while our proposed approach shows maintained performance and strongly reduced complexity of the model. Copyright © 2014 Elsevier B.V. All rights reserved.
High-dimensional inference with the generalized Hopfield model: principal component analysis and corrections.

PubMed

Cocco, S; Monasson, R; Sessak, V

2011-05-01

We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that maximum likelihood inference is deeply related to principal component analysis when the amplitude of the pattern components ξ is negligible compared to √N. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in ξ/√N. We stress the need to generalize the Hopfield model and include both attractive and repulsive patterns in order to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size N and of the amplitude ξ. The inference approach is illustrated on synthetic and biological data.
Practical recipes for the model order reduction, dynamical simulation and compressive sampling of large-scale open quantum systems

NASA Astrophysics Data System (ADS)

Sidles, John A.; Garbini, Joseph L.; Harrell, Lee E.; Hero, Alfred O.; Jacky, Jonathan P.; Malcomb, Joseph R.; Norman, Anthony G.; Williamson, Austin M.

2009-06-01

Practical recipes are presented for simulating high-temperature and nonequilibrium quantum spin systems that are continuously measured and controlled. The notion of a spin system is broadly conceived, in order to encompass macroscopic test masses as the limiting case of large-j spins. The simulation technique has three stages: first the deliberate introduction of noise into the simulation, then the conversion of that noise into an equivalent continuous measurement and control process, and finally, projection of the trajectory onto state-space manifolds having reduced dimensionality and possessing a Kähler potential of multilinear algebraic form. These state-spaces can be regarded as ruled algebraic varieties upon which a projective quantum model order reduction (MOR) is performed. The Riemannian sectional curvature of ruled Kählerian varieties is analyzed, and proved to be non-positive upon all sections that contain a rule. These manifolds are shown to contain Slater determinants as a special case and their identity with Grassmannian varieties is demonstrated. The resulting simulation formalism is used to construct a positive P-representation for the thermal density matrix. Single-spin detection by magnetic resonance force microscopy (MRFM) is simulated, and the data statistics are shown to be those of a random telegraph signal with additive white noise. Larger-scale spin-dust models are simulated, having no spatial symmetry and no spatial ordering; the high-fidelity projection of numerically computed quantum trajectories onto low dimensionality Kähler state-space manifolds is demonstrated. The reconstruction of quantum trajectories from sparse random projections is demonstrated, the onset of Donoho-Stodden breakdown at the Candès-Tao sparsity limit is observed, a deterministic construction for sampling matrices is given and methods for quantum state optimization by Dantzig selection are given.
A Comparison between Model Base Hardconstrain, Bandlimited, and Sparse-Spike Seismic Inversion: New Insights for CBM Reservoir Modelling on Muara Enim Formation, South Sumatra

NASA Astrophysics Data System (ADS)

Mohamad Noor, Faris; Adipta, Agra

2018-03-01

Coal Bed Methane (CBM) as a newly developed resource in Indonesia is one of the alternatives to relieve Indonesia’s dependencies on conventional energies. Coal resource of Muara Enim Formation is known as one of the prolific reservoirs in South Sumatra Basin. Seismic inversion and well analysis are done to determine the coal seam characteristics of Muara Enim Formation. This research uses three inversion methods, which are: model base hard- constrain, bandlimited, and sparse-spike inversion. Each type of seismic inversion has its own advantages to display the coal seam and its characteristic. Interpretation result from the analysis data shows that the Muara Enim coal seam has 20 (API) gamma ray value, 1 (gr/cc) – 1.4 (gr/cc) from density log, and low AI cutoff value range between 5000-6400 (m/s)*(g/cc). The distribution of coal seam is laterally thinning northwest to southeast. Coal seam is seen biasedly on model base hard constraint inversion and discontinued on band-limited inversion which isn’t similar to the geological model. The appropriate AI inversion is sparse spike inversion which has 0.884757 value from cross plot inversion as the best correlation value among the chosen inversion methods. Sparse Spike inversion its self-has high amplitude as a proper tool to identify coal seam continuity which commonly appears as a thin layer. Cross-sectional sparse spike inversion shows that there are possible new boreholes in CDP 3662-3722, CDP 3586-3622, and CDP 4004-4148 which is seen in seismic data as a thick coal seam.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

PubMed

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Variational Assimilation of Sparse and Uncertain Satellite Data For 1D Saint-Venant River Models

NASA Astrophysics Data System (ADS)

Garambois, P. A.; Brisset, P.; Monnier, J.; Roux, H.

2016-12-01

Profusion of satellites are providing increasingly accurate measurements of continental water cyle, and water bodies variations while in situ observability is declining. The future Surface Water and Ocean Topography (SWOT) mission will provide maps of river surface elevations widths and slopes with an almost global coverage and temporal revisits. This will offer the possibility to address a larger variety of inverse problems in surface hydrology. Data assimilation techniques, that are broadly used in several scientific fields, aim to optimally combine models, system observations and prior information. Variational assimilation consists in iterative minimization of a discrepency measure between model outputs and observations, here for retrieving boundary conditions and parameters of a 1D Saint Venant model. Nevertheless, inferring river discharge and hydraulic parameters thanks to the observation of river surface is not straightforward. This is particularly true in the case of sparse and uncertain observations of flow state variables since they are governed by nonlinear physical processes. This paper investigates the identifiability of hydraulic controls given sparse and uncertain satellite observations of a river. The identifiability of river discharge alone and with roughness is tested for several spatio temporal patterns of river observations, including SWOT like observations. A new 1D Shallow water model with variational data assimilation, within the DassFlow chain is presented as well as postprocessing and observation operator dedicated to the future SWOT and SWOT simulator data. In view to decrease inverse problem dimensionality discharge is represented in a reduced basis. Moreover we introduce an original and reduced parametrization of the flow resistance that can account for various flow regimes along with a cross section design dedicated to remote sensing. We show which discharge temporal frequencies can be identified w.r.t observation ones and at which accuracy. Eventually the important question of the discharge identifiability potential between observation times and depending on the spatio-temporal sampling is adressed with respect to the wave lengths of the hydrological signals.
A Comparison of Compressed Sensing and Sparse Recovery Algorithms Applied to Simulation Data

DOE PAGES

Fan, Ya Ju; Kamath, Chandrika

2016-09-01

The move toward exascale computing for scientific simulations is placing new demands on compression techniques. It is expected that the I/O system will not be able to support the volume of data that is expected to be written out. To enable quantitative analysis and scientific discovery, we are interested in techniques that compress high-dimensional simulation data and can provide perfect or near-perfect reconstruction. In this paper, we explore the use of compressed sensing (CS) techniques to reduce the size of the data before they are written out. Using large-scale simulation data, we investigate how the sufficient sparsity condition and themore » contrast in the data affect the quality of reconstruction and the degree of compression. Also, we provide suggestions for the practical implementation of CS techniques and compare them with other sparse recovery methods. Finally, our results show that despite longer times for reconstruction, compressed sensing techniques can provide near perfect reconstruction over a range of data with varying sparsity.« less

A Comparison of Compressed Sensing and Sparse Recovery Algorithms Applied to Simulation Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, Ya Ju; Kamath, Chandrika

The move toward exascale computing for scientific simulations is placing new demands on compression techniques. It is expected that the I/O system will not be able to support the volume of data that is expected to be written out. To enable quantitative analysis and scientific discovery, we are interested in techniques that compress high-dimensional simulation data and can provide perfect or near-perfect reconstruction. In this paper, we explore the use of compressed sensing (CS) techniques to reduce the size of the data before they are written out. Using large-scale simulation data, we investigate how the sufficient sparsity condition and themore » contrast in the data affect the quality of reconstruction and the degree of compression. Also, we provide suggestions for the practical implementation of CS techniques and compare them with other sparse recovery methods. Finally, our results show that despite longer times for reconstruction, compressed sensing techniques can provide near perfect reconstruction over a range of data with varying sparsity.« less
Parameter Estimation for a Pulsating Turbulent Buoyant Jet Using Approximate Bayesian Computation

NASA Astrophysics Data System (ADS)

Christopher, Jason; Wimer, Nicholas; Lapointe, Caelan; Hayden, Torrey; Grooms, Ian; Rieker, Greg; Hamlington, Peter

2017-11-01

Approximate Bayesian Computation (ABC) is a powerful tool that allows sparse experimental or other ``truth'' data to be used for the prediction of unknown parameters, such as flow properties and boundary conditions, in numerical simulations of real-world engineering systems. Here we introduce the ABC approach and then use ABC to predict unknown inflow conditions in simulations of a two-dimensional (2D) turbulent, high-temperature buoyant jet. For this test case, truth data are obtained from a direct numerical simulation (DNS) with known boundary conditions and problem parameters, while the ABC procedure utilizes lower fidelity large eddy simulations. Using spatially-sparse statistics from the 2D buoyant jet DNS, we show that the ABC method provides accurate predictions of true jet inflow parameters. The success of the ABC approach in the present test suggests that ABC is a useful and versatile tool for predicting flow information, such as boundary conditions, that can be difficult to determine experimentally.
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation

NASA Astrophysics Data System (ADS)

Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.

2016-12-01

With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
Rational variety mapping for contrast-enhanced nonlinear unsupervised segmentation of multispectral images of unstained specimen.

PubMed

Kopriva, Ivica; Hadžija, Mirko; Popović Hadžija, Marijana; Korolija, Marina; Cichocki, Andrzej

2011-08-01

A methodology is proposed for nonlinear contrast-enhanced unsupervised segmentation of multispectral (color) microscopy images of principally unstained specimens. The methodology exploits spectral diversity and spatial sparseness to find anatomical differences between materials (cells, nuclei, and background) present in the image. It consists of rth-order rational variety mapping (RVM) followed by matrix/tensor factorization. Sparseness constraint implies duality between nonlinear unsupervised segmentation and multiclass pattern assignment problems. Classes not linearly separable in the original input space become separable with high probability in the higher-dimensional mapped space. Hence, RVM mapping has two advantages: it takes implicitly into account nonlinearities present in the image (ie, they are not required to be known) and it increases spectral diversity (ie, contrast) between materials, due to increased dimensionality of the mapped space. This is expected to improve performance of systems for automated classification and analysis of microscopic histopathological images. The methodology was validated using RVM of the second and third orders of the experimental multispectral microscopy images of unstained sciatic nerve fibers (nervus ischiadicus) and of unstained white pulp in the spleen tissue, compared with a manually defined ground truth labeled by two trained pathophysiologists. The methodology can also be useful for additional contrast enhancement of images of stained specimens. Copyright © 2011 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
Sparse Multivariate Autoregressive Modeling for Mild Cognitive Impairment Classification

PubMed Central

Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen

2014-01-01

Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach. PMID:24595922
Universal Priors for Sparse Modeling(PREPRINT)

DTIC Science & Technology

2009-08-01

Ingenierı́a Eléctrica, Universidad de la República J. Herrera y Reissig 565, Montevideo 11300, Uruguay 2fefo@fing.edu.uy Abstract—Sparse data models, where...Aj‖0 = |Aj | as its cardinality . The goal of sparse modeling is to design a dictionary D such that X = DA with ‖Aj‖0 sufficiently small (usually below
Reconstructing cortical current density by exploring sparseness in the transform domain

NASA Astrophysics Data System (ADS)

Ding, Lei

2009-05-01

In the present study, we have developed a novel electromagnetic source imaging approach to reconstruct extended cortical sources by means of cortical current density (CCD) modeling and a novel EEG imaging algorithm which explores sparseness in cortical source representations through the use of L1-norm in objective functions. The new sparse cortical current density (SCCD) imaging algorithm is unique since it reconstructs cortical sources by attaining sparseness in a transform domain (the variation map of cortical source distributions). While large variations are expected to occur along boundaries (sparseness) between active and inactive cortical regions, cortical sources can be reconstructed and their spatial extents can be estimated by locating these boundaries. We studied the SCCD algorithm using numerous simulations to investigate its capability in reconstructing cortical sources with different extents and in reconstructing multiple cortical sources with different extent contrasts. The SCCD algorithm was compared with two L2-norm solutions, i.e. weighted minimum norm estimate (wMNE) and cortical LORETA. Our simulation data from the comparison study show that the proposed sparse source imaging algorithm is able to accurately and efficiently recover extended cortical sources and is promising to provide high-accuracy estimation of cortical source extents.
Measurement of Front Curvature and Detonation Velocity for a Nonideal Heterogeneous Explosive in Axisymmetric and Two-Dimensional Geometries

NASA Astrophysics Data System (ADS)

Higgins, Andrew

2009-06-01

Detonation in a heterogeneous explosive with a relatively sparse concentration of reaction centers (``hot spots'') is investigated experimentally. The explosive system considered is nitromethane gelled with PMMA and with glass microballoons (GMB's) in suspension. The detonation velocity is measured as a function of the characteristic charge dimension (diameter or thickness) in both axisymmetric and two-dimensional planar geometries. The use of a unique, annular charge geometry (with the diameter of the annulus much greater than the annular gap thickness) permits quasi-two-dimensional detonations to be observed without undesirable lateral rarefactions that result from a finite aspect ratio. The detonation front curvature is also measured directly using an electronic streak camera. The results confirm the prior findings of Gois et al. (1996) which showed that, for a low concentration of GMB's, detonation propagation does not exhibit the expected 2:1 scaling from axisymmetric to planar geometries. This reinforces the idea that detonation in highly nonideal explosives is not governed exclusively by front curvature.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.

PubMed

Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T

2017-01-01

This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks

PubMed Central

Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.

2017-01-01

This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009
Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.

PubMed

Lam, Clifford; Fan, Jianqing

2009-01-01

This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
Extrapolating cetacean densities to quantitatively assess human impacts on populations in the high seas.

PubMed

Mannocci, Laura; Roberts, Jason J; Miller, David L; Halpin, Patrick N

2017-06-01

As human activities expand beyond national jurisdictions to the high seas, there is an increasing need to consider anthropogenic impacts to species inhabiting these waters. The current scarcity of scientific observations of cetaceans in the high seas impedes the assessment of population-level impacts of these activities. We developed plausible density estimates to facilitate a quantitative assessment of anthropogenic impacts on cetacean populations in these waters. Our study region extended from a well-surveyed region within the U.S. Exclusive Economic Zone into a large region of the western North Atlantic sparsely surveyed for cetaceans. We modeled densities of 15 cetacean taxa with available line transect survey data and habitat covariates and extrapolated predictions to sparsely surveyed regions. We formulated models to reduce the extent of extrapolation beyond covariate ranges, and constrained them to model simple and generalizable relationships. To evaluate confidence in the predictions, we mapped where predictions were made outside sampled covariate ranges, examined alternate models, and compared predicted densities with maps of sightings from sources that could not be integrated into our models. Confidence levels in model results depended on the taxon and geographic area and highlighted the need for additional surveying in environmentally distinct areas. With application of necessary caution, our density estimates can inform management needs in the high seas, such as the quantification of potential cetacean interactions with military training exercises, shipping, fisheries, and deep-sea mining and be used to delineate areas of special biological significance in international waters. Our approach is generally applicable to other marine taxa and geographic regions for which management will be implemented but data are sparse. © 2016 The Authors. Conservation Biology published by Wiley Periodicals, Inc. on behalf of Society for Conservation Biology.
EPR oximetry in three spatial dimensions using sparse spin distribution

NASA Astrophysics Data System (ADS)

Som, Subhojit; Potter, Lee C.; Ahmad, Rizwan; Vikram, Deepti S.; Kuppusamy, Periannan

2008-08-01

A method is presented to use continuous wave electron paramagnetic resonance imaging for rapid measurement of oxygen partial pressure in three spatial dimensions. A particulate paramagnetic probe is employed to create a sparse distribution of spins in a volume of interest. Information encoding location and spectral linewidth is collected by varying the spatial orientation and strength of an applied magnetic gradient field. Data processing exploits the spatial sparseness of spins to detect voxels with nonzero spin and to estimate the spectral linewidth for those voxels. The parsimonious representation of spin locations and linewidths permits an order of magnitude reduction in data acquisition time, compared to four-dimensional tomographic reconstruction using traditional spectral-spatial imaging. The proposed oximetry method is experimentally demonstrated for a lithium octa- n-butoxy naphthalocyanine (LiNc-BuO) probe using an L-band EPR spectrometer.
Efficient Bayesian parameter estimation with implicit sampling and surrogate modeling for a vadose zone hydrological problem

NASA Astrophysics Data System (ADS)

Liu, Y.; Pau, G. S. H.; Finsterle, S.

2015-12-01

Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simulated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231
An approach to predict water quality in data-sparse catchments using hydrological catchment similarity

NASA Astrophysics Data System (ADS)

Pohle, Ina; Glendell, Miriam; Stutter, Marc I.; Helliwell, Rachel C.

2017-04-01

An understanding of catchment response to climate and land use change at a regional scale is necessary for the assessment of mitigation and adaptation options addressing diffuse nutrient pollution. It is well documented that the physicochemical properties of a river ecosystem respond to change in a non-linear fashion. This is particularly important when threshold water concentrations, relevant to national and EU legislation, are exceeded. Large scale (regional) model assessments required for regulatory purposes must represent the key processes and mechanisms that are more readily understood in catchments with water quantity and water quality data monitored at high spatial and temporal resolution. While daily discharge data are available for most catchments in Scotland, nitrate and phosphorus are mostly available on a monthly basis only, as typified by regulatory monitoring. However, high resolution (hourly to daily) water quantity and water quality data exist for a limited number of research catchments. To successfully implement adaptation measures across Scotland, an upscaling from data-rich to data-sparse catchments is required. In addition, the widespread availability of spatial datasets affecting hydrological and biogeochemical responses (e.g. soils, topography/geomorphology, land use, vegetation etc.) provide an opportunity to transfer predictions between data-rich and data-sparse areas by linking processes and responses to catchment attributes. Here, we develop a framework of catchment typologies as a prerequisite for transferring information from data-rich to data-sparse catchments by focusing on how hydrological catchment similarity can be used as an indicator of grouped behaviours in water quality response. As indicators of hydrological catchment similarity we use flow indices derived from observed discharge data across Scotland as well as hydrological model parameters. For the latter, we calibrated the lumped rainfall-runoff model TUWModel using multiple objective functions. The relationships between indicators of hydrological catchment similarity, physical catchment characteristics and nitrate and phosphorus concentrations in rivers are then investigated using multivariate statistics. This understanding of the relationship between catchment characteristics, hydrological processes and water quality will allow us to implement more efficient regulatory water quality monitoring strategies, to improve existing water quality models and to model mitigation and adaptation scenarios to global change in data-sparse catchments.
Low-dimensional spike rate models derived from networks of adaptive integrate-and-fire neurons: Comparison and implementation.

PubMed

Augustin, Moritz; Ladenbauer, Josef; Baumann, Fabian; Obermayer, Klaus

2017-06-01

The spiking activity of single neurons can be well described by a nonlinear integrate-and-fire model that includes somatic adaptation. When exposed to fluctuating inputs sparsely coupled populations of these model neurons exhibit stochastic collective dynamics that can be effectively characterized using the Fokker-Planck equation. This approach, however, leads to a model with an infinite-dimensional state space and non-standard boundary conditions. Here we derive from that description four simple models for the spike rate dynamics in terms of low-dimensional ordinary differential equations using two different reduction techniques: one uses the spectral decomposition of the Fokker-Planck operator, the other is based on a cascade of two linear filters and a nonlinearity, which are determined from the Fokker-Planck equation and semi-analytically approximated. We evaluate the reduced models for a wide range of biologically plausible input statistics and find that both approximation approaches lead to spike rate models that accurately reproduce the spiking behavior of the underlying adaptive integrate-and-fire population. Particularly the cascade-based models are overall most accurate and robust, especially in the sensitive region of rapidly changing input. For the mean-driven regime, when input fluctuations are not too strong and fast, however, the best performing model is based on the spectral decomposition. The low-dimensional models also well reproduce stable oscillatory spike rate dynamics that are generated either by recurrent synaptic excitation and neuronal adaptation or through delayed inhibitory synaptic feedback. The computational demands of the reduced models are very low but the implementation complexity differs between the different model variants. Therefore we have made available implementations that allow to numerically integrate the low-dimensional spike rate models as well as the Fokker-Planck partial differential equation in efficient ways for arbitrary model parametrizations as open source software. The derived spike rate descriptions retain a direct link to the properties of single neurons, allow for convenient mathematical analyses of network states, and are well suited for application in neural mass/mean-field based brain network models.
Low-dimensional spike rate models derived from networks of adaptive integrate-and-fire neurons: Comparison and implementation

PubMed Central

Baumann, Fabian; Obermayer, Klaus

2017-01-01

The spiking activity of single neurons can be well described by a nonlinear integrate-and-fire model that includes somatic adaptation. When exposed to fluctuating inputs sparsely coupled populations of these model neurons exhibit stochastic collective dynamics that can be effectively characterized using the Fokker-Planck equation. This approach, however, leads to a model with an infinite-dimensional state space and non-standard boundary conditions. Here we derive from that description four simple models for the spike rate dynamics in terms of low-dimensional ordinary differential equations using two different reduction techniques: one uses the spectral decomposition of the Fokker-Planck operator, the other is based on a cascade of two linear filters and a nonlinearity, which are determined from the Fokker-Planck equation and semi-analytically approximated. We evaluate the reduced models for a wide range of biologically plausible input statistics and find that both approximation approaches lead to spike rate models that accurately reproduce the spiking behavior of the underlying adaptive integrate-and-fire population. Particularly the cascade-based models are overall most accurate and robust, especially in the sensitive region of rapidly changing input. For the mean-driven regime, when input fluctuations are not too strong and fast, however, the best performing model is based on the spectral decomposition. The low-dimensional models also well reproduce stable oscillatory spike rate dynamics that are generated either by recurrent synaptic excitation and neuronal adaptation or through delayed inhibitory synaptic feedback. The computational demands of the reduced models are very low but the implementation complexity differs between the different model variants. Therefore we have made available implementations that allow to numerically integrate the low-dimensional spike rate models as well as the Fokker-Planck partial differential equation in efficient ways for arbitrary model parametrizations as open source software. The derived spike rate descriptions retain a direct link to the properties of single neurons, allow for convenient mathematical analyses of network states, and are well suited for application in neural mass/mean-field based brain network models. PMID:28644841
Estimating patient-specific and anatomically correct reference model for craniomaxillofacial deformity via sparse representation

PubMed Central

Wang, Li; Ren, Yi; Gao, Yaozong; Tang, Zhen; Chen, Ken-Chung; Li, Jianfu; Shen, Steve G. F.; Yan, Jin; Lee, Philip K. M.; Chow, Ben; Xia, James J.; Shen, Dinggang

2015-01-01

Purpose: A significant number of patients suffer from craniomaxillofacial (CMF) deformity and require CMF surgery in the United States. The success of CMF surgery depends on not only the surgical techniques but also an accurate surgical planning. However, surgical planning for CMF surgery is challenging due to the absence of a patient-specific reference model. Currently, the outcome of the surgery is often subjective and highly dependent on surgeon’s experience. In this paper, the authors present an automatic method to estimate an anatomically correct reference shape of jaws for orthognathic surgery, a common type of CMF surgery. Methods: To estimate a patient-specific jaw reference model, the authors use a data-driven method based on sparse shape composition. Given a dictionary of normal subjects, the authors first use the sparse representation to represent the midface of a patient by the midfaces of the normal subjects in the dictionary. Then, the derived sparse coefficients are used to reconstruct a patient-specific reference jaw shape. Results: The authors have validated the proposed method on both synthetic and real patient data. Experimental results show that the authors’ method can effectively reconstruct the normal shape of jaw for patients. Conclusions: The authors have presented a novel method to automatically estimate a patient-specific reference model for the patient suffering from CMF deformity. PMID:26429255
A new sampling scheme for developing metamodels with the zeros of Chebyshev polynomials

NASA Astrophysics Data System (ADS)

Wu, Jinglai; Luo, Zhen; Zhang, Nong; Zhang, Yunqing

2015-09-01

The accuracy of metamodelling is determined by both the sampling and approximation. This article proposes a new sampling method based on the zeros of Chebyshev polynomials to capture the sampling information effectively. First, the zeros of one-dimensional Chebyshev polynomials are applied to construct Chebyshev tensor product (CTP) sampling, and the CTP is then used to construct high-order multi-dimensional metamodels using the 'hypercube' polynomials. Secondly, the CTP sampling is further enhanced to develop Chebyshev collocation method (CCM) sampling, to construct the 'simplex' polynomials. The samples of CCM are randomly and directly chosen from the CTP samples. Two widely studied sampling methods, namely the Smolyak sparse grid and Hammersley, are used to demonstrate the effectiveness of the proposed sampling method. Several numerical examples are utilized to validate the approximation accuracy of the proposed metamodel under different dimensions.
A new global approach to obtain three-dimensional displacement maps by integrating GPS and DInSAR data

NASA Astrophysics Data System (ADS)

Guglielmino, F.; Nunnari, G.; Puglisi, G.; Spata, A.

2009-04-01

We propose a new technique, based on the elastic theory, to efficiently produce an estimate of three-dimensional surface displacement maps by integrating sparse Global Position System (GPS) measurements of deformations and Differential Interferometric Synthetic Aperture Radar (DInSAR) maps of movements of the Earth's surface. The previous methodologies known in literature, for combining data from GPS and DInSAR surveys, require two steps: the first, in which sparse GPS measurements are interpolated in order to fill in GPS displacements at the DInSAR grid, and the second, to estimate the three-dimensional surface displacement maps by using a suitable optimization technique. One of the advantages of the proposed approach is that both these steps are unified. We propose a linear matrix equation which accounts for both GPS and DInSAR data whose solution provide simultaneously the strain tensor, the displacement field and the rigid body rotation tensor throughout the entire investigated area. The mentioned linear matrix equation is solved by using the Weighted Least Square (WLS) thus assuring both numerical robustness and high computation efficiency. The proposed methodology was tested on both synthetic and experimental data, these last from GPS and DInSAR measurements carried out on Mt. Etna. The goodness of the results has been evaluated by using standard errors. These tests also allow optimising the choice of specific parameters of this algorithm. This "open" structure of the method will allow in the near future to take account of other available data sets, such as additional interferograms, or other geodetic data (e.g. levelling, tilt, etc.), in order to achieve even higher accuracy.

Bidirectional Elastic Image Registration Using B-Spline Affine Transformation

PubMed Central

Gu, Suicheng; Meng, Xin; Sciurba, Frank C.; Wang, Chen; Kaminski, Naftali; Pu, Jiantao

2014-01-01

A registration scheme termed as B-spline affine transformation (BSAT) is presented in this study to elastically align two images. We define an affine transformation instead of the traditional translation at each control point. Mathematically, BSAT is a generalized form of the affine transformation and the traditional B-Spline transformation (BST). In order to improve the performance of the iterative closest point (ICP) method in registering two homologous shapes but with large deformation, a bi-directional instead of the traditional unidirectional objective / cost function is proposed. In implementation, the objective function is formulated as a sparse linear equation problem, and a sub-division strategy is used to achieve a reasonable efficiency in registration. The performance of the developed scheme was assessed using both two-dimensional (2D) synthesized dataset and three-dimensional (3D) volumetric computed tomography (CT) data. Our experiments showed that the proposed B-spline affine model could obtain reasonable registration accuracy. PMID:24530210
A Multiobjective Sparse Feature Learning Model for Deep Neural Networks.

PubMed

Gong, Maoguo; Liu, Jia; Li, Hao; Cai, Qing; Su, Linzhi

2015-12-01

Hierarchical deep neural networks are currently popular learning models for imitating the hierarchical architecture of human brain. Single-layer feature extractors are the bricks to build deep networks. Sparse feature learning models are popular models that can learn useful representations. But most of those models need a user-defined constant to control the sparsity of representations. In this paper, we propose a multiobjective sparse feature learning model based on the autoencoder. The parameters of the model are learnt by optimizing two objectives, reconstruction error and the sparsity of hidden units simultaneously to find a reasonable compromise between them automatically. We design a multiobjective induced learning procedure for this model based on a multiobjective evolutionary algorithm. In the experiments, we demonstrate that the learning procedure is effective, and the proposed multiobjective model can learn useful sparse features.
Low-rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging

PubMed Central

Ravishankar, Saiprasad; Moore, Brian E.; Nadakuditi, Raj Rao; Fessler, Jeffrey A.

2017-01-01

Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamic magnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method. PMID:28092528
Low-Rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging.

PubMed

Ravishankar, Saiprasad; Moore, Brian E; Nadakuditi, Raj Rao; Fessler, Jeffrey A

2017-05-01

Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery fromundersampledmeasurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamicmagnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method.
Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

PubMed

Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Total variation-based method for radar coincidence imaging with model mismatch for extended target

NASA Astrophysics Data System (ADS)

Cao, Kaicheng; Zhou, Xiaoli; Cheng, Yongqiang; Fan, Bo; Qin, Yuliang

2017-11-01

Originating from traditional optical coincidence imaging, radar coincidence imaging (RCI) is a staring/forward-looking imaging technique. In RCI, the reference matrix must be computed precisely to reconstruct the image as preferred; unfortunately, such precision is almost impossible due to the existence of model mismatch in practical applications. Although some conventional sparse recovery algorithms are proposed to solve the model-mismatch problem, they are inapplicable to nonsparse targets. We therefore sought to derive the signal model of RCI with model mismatch by replacing the sparsity constraint item with total variation (TV) regularization in the sparse total least squares optimization problem; in this manner, we obtain the objective function of RCI with model mismatch for an extended target. A more robust and efficient algorithm called TV-TLS is proposed, in which the objective function is divided into two parts and the perturbation matrix and scattering coefficients are updated alternately. Moreover, due to the ability of TV regularization to recover sparse signal or image with sparse gradient, TV-TLS method is also applicable to sparse recovering. Results of numerical experiments demonstrate that, for uniform extended targets, sparse targets, and real extended targets, the algorithm can achieve preferred imaging performance both in suppressing noise and in adapting to model mismatch.
Exponential series approaches for nonparametric graphical models

NASA Astrophysics Data System (ADS)

Janofsky, Eric

Markov Random Fields (MRFs) or undirected graphical models are parsimonious representations of joint probability distributions. This thesis studies high-dimensional, continuous-valued pairwise Markov Random Fields. We are particularly interested in approximating pairwise densities whose logarithm belongs to a Sobolev space. For this problem we propose the method of exponential series which approximates the log density by a finite-dimensional exponential family with the number of sufficient statistics increasing with the sample size. We consider two approaches to estimating these models. The first is regularized maximum likelihood. This involves optimizing the sum of the log-likelihood of the data and a sparsity-inducing regularizer. We then propose a variational approximation to the likelihood based on tree-reweighted, nonparametric message passing. This approximation allows for upper bounds on risk estimates, leverages parallelization and is scalable to densities on hundreds of nodes. We show how the regularized variational MLE may be estimated using a proximal gradient algorithm. We then consider estimation using regularized score matching. This approach uses an alternative scoring rule to the log-likelihood, which obviates the need to compute the normalizing constant of the distribution. For general continuous-valued exponential families, we provide parameter and edge consistency results. As a special case we detail a new approach to sparse precision matrix estimation which has statistical performance competitive with the graphical lasso and computational performance competitive with the state-of-the-art glasso algorithm. We then describe results for model selection in the nonparametric pairwise model using exponential series. The regularized score matching problem is shown to be a convex program; we provide scalable algorithms based on consensus alternating direction method of multipliers (ADMM) and coordinate-wise descent. We use simulations to compare our method to others in the literature as well as the aforementioned TRW estimator.
Design principles of the sparse coding network and the role of “sister cells” in the olfactory system of Drosophila

PubMed Central

Zhang, Danke; Li, Yuanqing; Wu, Si; Rasch, Malte J.

2013-01-01

Sensory systems face the challenge to represent sensory inputs in a way to allow easy readout of sensory information by higher brain areas. In the olfactory system of the fly drosopohila melanogaster, projection neurons (PNs) of the antennal lobe (AL) convert a dense activation of glomeruli into a sparse, high-dimensional firing pattern of Kenyon cells (KCs) in the mushroom body (MB). Here we investigate the design principles of the olfactory system of drosophila in regard to the capabilities to discriminate odor quality from the MB representation and its robustness to different types of noise. We focus on understanding the role of highly correlated homotypic projection neurons (“sister cells”) found in the glomeruli of flies. These cells are coupled by gap-junctions and receive almost identical sensory inputs, but target randomly different KCs in MB. We show that sister cells might play a crucial role in increasing the robustness of the MB odor representation to noise. Computationally, sister cells thus might help the system to improve the generalization capabilities in face of noise without impairing the discriminability of odor quality at the same time. PMID:24167488
Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity.

PubMed

Chang, Jinyuan; Zheng, Chao; Zhou, Wen-Xin; Zhou, Wen

2017-12-01

In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2017, The International Biometric Society.
Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

PubMed

Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

2016-09-01

Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
Lognormal Kalman filter for assimilating phase space density data in the radiation belts

NASA Astrophysics Data System (ADS)

Kondrashov, D.; Ghil, M.; Shprits, Y.

2011-11-01

Data assimilation combines a physical model with sparse observations and has become an increasingly important tool for scientists and engineers in the design, operation, and use of satellites and other high-technology systems in the near-Earth space environment. Of particular importance is predicting fluxes of high-energy particles in the Van Allen radiation belts, since these fluxes can damage spaceborne platforms and instruments during strong geomagnetic storms. In transiting from a research setting to operational prediction of these fluxes, improved data assimilation is of the essence. The present study is motivated by the fact that phase space densities (PSDs) of high-energy electrons in the outer radiation belt—both simulated and observed—are subject to spatiotemporal variations that span several orders of magnitude. Standard data assimilation methods that are based on least squares minimization of normally distributed errors may not be adequate for handling the range of these variations. We propose herein a modification of Kalman filtering that uses a log-transformed, one-dimensional radial diffusion model for the PSDs and includes parameterized losses. The proposed methodology is first verified on model-simulated, synthetic data and then applied to actual satellite measurements. When the model errors are sufficiently smaller then observational errors, our methodology can significantly improve analysis and prediction skill for the PSDs compared to those of the standard Kalman filter formulation. This improvement is documented by monitoring the variance of the innovation sequence.
Precession missile feature extraction using sparse component analysis of radar measurements

NASA Astrophysics Data System (ADS)

Liu, Lihua; Du, Xiaoyong; Ghogho, Mounir; Hu, Weidong; McLernon, Des

2012-12-01

According to the working mode of the ballistic missile warning radar (BMWR), the radar return from the BMWR is usually sparse. To recognize and identify the warhead, it is necessary to extract the precession frequency and the locations of the scattering centers of the missile. This article first analyzes the radar signal model of the precessing conical missile during flight and develops the sparse dictionary which is parameterized by the unknown precession frequency. Based on the sparse dictionary, the sparse signal model is then established. A nonlinear least square estimation is first applied to roughly extract the precession frequency in the sparse dictionary. Based on the time segmented radar signal, a sparse component analysis method using the orthogonal matching pursuit algorithm is then proposed to jointly estimate the precession frequency and the scattering centers of the missile. Simulation results illustrate the validity of the proposed method.
Enhancing sparsity of Hermite polynomial expansions by iterative rotations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Xiu; Lei, Huan; Baker, Nathan A.

2016-02-01

Compressive sensing has become a powerful addition to uncertainty quantification in recent years. This paper identifies new bases for random variables through linear mappings such that the representation of the quantity of interest is more sparse with new basis functions associated with the new random variables. This sparsity increases both the efficiency and accuracy of the compressive sensing-based uncertainty quantification method. Specifically, we consider rotation- based linear mappings which are determined iteratively for Hermite polynomial expansions. We demonstrate the effectiveness of the new method with applications in solving stochastic partial differential equations and high-dimensional (O(100)) problems.
Subject-based discriminative sparse representation model for detection of concealed information.

PubMed

Akhavan, Amir; Moradi, Mohammad Hassan; Vand, Safa Rafiei

2017-05-01

The use of machine learning approaches in concealed information test (CIT) plays a key role in the progress of this neurophysiological field. In this paper, we presented a new machine learning method for CIT in which each subject is considered independent of the others. The main goal of this study is to adapt the discriminative sparse models to be applicable for subject-based concealed information test. In order to provide sufficient discriminability between guilty and innocent subjects, we introduced a novel discriminative sparse representation model and its appropriate learning methods. For evaluation of the method forty-four subjects participated in a mock crime scenario and their EEG data were recorded. As the model input, in this study the recurrence plot features were extracted from single trial data of different stimuli. Then the extracted feature vectors were reduced using statistical dependency method. The reduced feature vector went through the proposed subject-based sparse model in which the discrimination power of sparse code and reconstruction error were applied simultaneously. Experimental results showed that the proposed approach achieved better performance than other competing discriminative sparse models. The classification accuracy, sensitivity and specificity of the presented sparsity-based method were about 93%, 91% and 95% respectively. Using the EEG data of a single subject in response to different stimuli types and with the aid of the proposed discriminative sparse representation model, one can distinguish guilty subjects from innocent ones. Indeed, this property eliminates the necessity of several subject EEG data in model learning and decision making for a specific subject. Copyright © 2017 Elsevier B.V. All rights reserved.
Big Data Challenges of High-Dimensional Continuous-Time Mean-Variance Portfolio Selection and a Remedy.

PubMed

Chiu, Mei Choi; Pun, Chi Seng; Wong, Hoi Ying

2017-08-01

Investors interested in the global financial market must analyze financial securities internationally. Making an optimal global investment decision involves processing a huge amount of data for a high-dimensional portfolio. This article investigates the big data challenges of two mean-variance optimal portfolios: continuous-time precommitment and constant-rebalancing strategies. We show that both optimized portfolios implemented with the traditional sample estimates converge to the worst performing portfolio when the portfolio size becomes large. The crux of the problem is the estimation error accumulated from the huge dimension of stock data. We then propose a linear programming optimal (LPO) portfolio framework, which applies a constrained ℓ 1 minimization to the theoretical optimal control to mitigate the risk associated with the dimensionality issue. The resulting portfolio becomes a sparse portfolio that selects stocks with a data-driven procedure and hence offers a stable mean-variance portfolio in practice. When the number of observations becomes large, the LPO portfolio converges to the oracle optimal portfolio, which is free of estimation error, even though the number of stocks grows faster than the number of observations. Our numerical and empirical studies demonstrate the superiority of the proposed approach. © 2017 Society for Risk Analysis.
Estimation of Dynamic Sparse Connectivity Patterns From Resting State fMRI.

PubMed

Cai, Biao; Zille, Pascal; Stephen, Julia M; Wilson, Tony W; Calhoun, Vince D; Wang, Yu Ping

2018-05-01

Functional connectivity (FC) estimated from functional magnetic resonance imaging (fMRI) time series, especially during resting state periods, provides a powerful tool to assess human brain functional architecture in health, disease, and developmental states. Recently, the focus of connectivity analysis has shifted toward the subnetworks of the brain, which reveals co-activating patterns over time. Most prior works produced a dense set of high-dimensional vectors, which are hard to interpret. In addition, their estimations to a large extent were based on an implicit assumption of spatial and temporal stationarity throughout the fMRI scanning session. In this paper, we propose an approach called dynamic sparse connectivity patterns (dSCPs), which takes advantage of both matrix factorization and time-varying fMRI time series to improve the estimation power of FC. The feasibility of analyzing dynamic FC with our model is first validated through simulated experiments. Then, we use our framework to measure the difference between young adults and children with real fMRI data set from the Philadelphia Neurodevelopmental Cohort (PNC). The results from the PNC data set showed significant FC differences between young adults and children in four different states. For instance, young adults had reduced connectivity between the default mode network and other subnetworks, as well as hyperconnectivity within the visual system in states 1 and 3, and hypoconnectivity in state 2. Meanwhile, they exhibited temporal correlation patterns that changed over time within functional subnetworks. In addition, the dSCPs model indicated that older people tend to spend more time within a relatively connected FC pattern. Overall, the proposed method provides a valid means to assess dynamic FC, which could facilitate the study of brain networks.
Notes on implementation of sparsely distributed memory

NASA Technical Reports Server (NTRS)

Keeler, J. D.; Denning, P. J.

1986-01-01

The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Estimation for sparse vegetation information in desertification region based on Tiangong-1 hyperspectral image.

PubMed

Wu, Jun-Jun; Gao, Zhi-Hai; Li, Zeng-Yuan; Wang, Hong-Yan; Pang, Yong; Sun, Bin; Li, Chang-Long; Li, Xu-Zhi; Zhang, Jiu-Xing

2014-03-01

In order to estimate the sparse vegetation information accurately in desertification region, taking southeast of Sunite Right Banner, Inner Mongolia, as the test site and Tiangong-1 hyperspectral image as the main data, sparse vegetation coverage and biomass were retrieved based on normalized difference vegetation index (NDVI) and soil adjusted vegetation index (SAVI), combined with the field investigation data. Then the advantages and disadvantages between them were compared. Firstly, the correlation between vegetation indexes and vegetation coverage under different bands combination was analyzed, as well as the biomass. Secondly, the best bands combination was determined when the maximum correlation coefficient turned up between vegetation indexes (VI) and vegetation parameters. It showed that the maximum correlation coefficient between vegetation parameters and NDVI could reach as high as 0.7, while that of SAVI could nearly reach 0.8. The center wavelength of red band in the best bands combination for NDVI was 630nm, and that of the near infrared (NIR) band was 910 nm. Whereas, when the center wavelength was 620 and 920 nm respectively, they were the best combination for SAVI. Finally, the linear regression models were established to retrieve vegetation coverage and biomass based on Tiangong-1 VIs. R2 of all models was more than 0.5, while that of the model based on SAVI was higher than that based on NDVI, especially, the R2 of vegetation coverage retrieve model based on SAVI was as high as 0.59. By intersection validation, the standard errors RMSE based on SAVI models were lower than that of the model based on NDVI. The results showed that the abundant spectral information of Tiangong-1 hyperspectral image can reflect the actual vegetaion condition effectively, and SAVI can estimate the sparse vegetation information more accurately than NDVI in desertification region.
ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL

PubMed Central

Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui

2013-01-01

We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities. PMID:24086091
ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL.

PubMed

Huang, Jian; Sun, Tingni; Ying, Zhiliang; Yu, Yi; Zhang, Cun-Hui

2013-06-01

We study the absolute penalized maximum partial likelihood estimator in sparse, high-dimensional Cox proportional hazards regression models where the number of time-dependent covariates can be larger than the sample size. We establish oracle inequalities based on natural extensions of the compatibility and cone invertibility factors of the Hessian matrix at the true regression coefficients. Similar results based on an extension of the restricted eigenvalue can be also proved by our method. However, the presented oracle inequalities are sharper since the compatibility and cone invertibility factors are always greater than the corresponding restricted eigenvalue. In the Cox regression model, the Hessian matrix is based on time-dependent covariates in censored risk sets, so that the compatibility and cone invertibility factors, and the restricted eigenvalue as well, are random variables even when they are evaluated for the Hessian at the true regression coefficients. Under mild conditions, we prove that these quantities are bounded from below by positive constants for time-dependent covariates, including cases where the number of covariates is of greater order than the sample size. Consequently, the compatibility and cone invertibility factors can be treated as positive constants in our oracle inequalities.

3-D architecture modeling using high-resolution seismic data and sparse well control: Example from the Mars {open_quotes}Pink{close_quotes} reservoir, Mississippi Canyon Area, Gulf of Mexico

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chapin, M.A.; Tiller, G.M.; Mahaffie, M.J.

1996-12-31

Economic considerations of the deep-water turbidite play, in the Gulf of Mexico and elsewhere, require large reservoir volumes to be drained by relatively few, very expensive wells. Deep-water development projects to date have been planned on the basis of high-quality 3-D seismic data and sparse well control. The link between 3-D seismic, well control, and the 3-D geological and reservoir architecture model are demonstrated here for Pliocene turbidite sands of the {open_quotes}Pink{close_quotes} reservoir, Prospect Mars, Mississippi Canyon Areas 763 and 807, Gulf of Mexico. This information was used to better understand potential reservoir compartments for development well planning.
Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner.

PubMed

Wang, Yubo; Veluvolu, Kalyana C

2017-06-14

It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Challenges in Ocean Data Assimilation for the US West Coast

NASA Astrophysics Data System (ADS)

Li, Z.; Chao, Y.; Farrara, J.; Wang, X.

2006-12-01

A three-dimensional variational data assimilation (3DVAR) system has been developed for the Regional Ocean Modeling System (ROMS), and it is called ROMS-DAS. This system provides a capability of predicting meso- to small-scale variations with temporal scales from hours to days in the coastal oceans. To cope with the particular difficulties that result from complex coastlines and bottom topography, unbalanced flows and sparse observations, ROMS-DAS utilizes several novel strategies. These strategies include the implementation of three-dimensional anisotropic and inhomogeneous error correlations, application of particular weak dynamic constraints, and implementation of efficient and reliable algorithms for minimizing the cost function. The ROMS-DAS system was applied in field experiments for Monterey Bay during both 2003 (Autonomous Ocean Sampling Network - AOSN) and 2006 (MB06). These two experiments included intensive data collection from a variety of observational platforms, including satellites, airplanes, High Frequency radars, Acoustic Doppler Current Profilers, ships, drifters, buoys, autonomous underwater vehicles (AUV), and particularly a fleet of undersea gliders. Using these data sets, various data assimilation experiments were performed to address several major data assimilation challenges that arise from multi-scales structures, inhomogeneous properties, dynamical imbalance of the flow, and tides. Basing on these experiments, a set of strategies were formulated to deal with those challenges.
Sparse polynomial space approach to dissipative quantum systems: application to the sub-ohmic spin-boson model.

PubMed

Alvermann, A; Fehske, H

2009-04-17

We propose a general numerical approach to open quantum systems with a coupling to bath degrees of freedom. The technique combines the methodology of polynomial expansions of spectral functions with the sparse grid concept from interpolation theory. Thereby we construct a Hilbert space of moderate dimension to represent the bath degrees of freedom, which allows us to perform highly accurate and efficient calculations of static, spectral, and dynamic quantities using standard exact diagonalization algorithms. The strength of the approach is demonstrated for the phase transition, critical behavior, and dissipative spin dynamics in the spin-boson model.
Peculiar spectral statistics of ensembles of trees and star-like graphs

NASA Astrophysics Data System (ADS)

Kovaleva, V.; Maximov, Yu; Nechaev, S.; Valba, O.

2017-07-01

In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the ‘Lifshitz singularity’ emerging in the one-dimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However, the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, reflecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of an ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.
Solution of the three-dimensional Helmholtz equation with nonlocal boundary conditions

NASA Technical Reports Server (NTRS)

Hodge, Steve L.; Zorumski, William E.; Watson, Willie R.

1995-01-01

The Helmholtz equation is solved within a three-dimensional rectangular duct with a nonlocal radiation boundary condition at the duct exit plane. This condition accurately models the acoustic admittance at an arbitrarily-located computational boundary plane. A linear system of equations is constructed with second-order central differences for the Helmholtz operator and second-order backward differences for both local admittance conditions and the gradient term in the nonlocal radiation boundary condition. The resulting matrix equation is large, sparse, and non-Hermitian. The size and structure of the matrix makes direct solution techniques impractical; as a result, a nonstationary iterative technique is used for its solution. The theory behind the nonstationary technique is reviewed, and numerical results are presented for radiation from both a point source and a planar acoustic source. The solutions with the nonlocal boundary conditions are invariant to the location of the computational boundary, and the same nonlocal conditions are valid for all solutions. The nonlocal conditions thus provide a means of minimizing the size of three-dimensional computational domains.
Improvement of Disease Prediction and Modeling through the Use of Meteorological Ensembles: Human Plague in Uganda

PubMed Central

Moore, Sean M.; Monaghan, Andrew; Griffith, Kevin S.; Apangu, Titus; Mead, Paul S.; Eisen, Rebecca J.

2012-01-01

Climate and weather influence the occurrence, distribution, and incidence of infectious diseases, particularly those caused by vector-borne or zoonotic pathogens. Thus, models based on meteorological data have helped predict when and where human cases are most likely to occur. Such knowledge aids in targeting limited prevention and control resources and may ultimately reduce the burden of diseases. Paradoxically, localities where such models could yield the greatest benefits, such as tropical regions where morbidity and mortality caused by vector-borne diseases is greatest, often lack high-quality in situ local meteorological data. Satellite- and model-based gridded climate datasets can be used to approximate local meteorological conditions in data-sparse regions, however their accuracy varies. Here we investigate how the selection of a particular dataset can influence the outcomes of disease forecasting models. Our model system focuses on plague (Yersinia pestis infection) in the West Nile region of Uganda. The majority of recent human cases have been reported from East Africa and Madagascar, where meteorological observations are sparse and topography yields complex weather patterns. Using an ensemble of meteorological datasets and model-averaging techniques we find that the number of suspected cases in the West Nile region was negatively associated with dry season rainfall (December-February) and positively with rainfall prior to the plague season. We demonstrate that ensembles of available meteorological datasets can be used to quantify climatic uncertainty and minimize its impacts on infectious disease models. These methods are particularly valuable in regions with sparse observational networks and high morbidity and mortality from vector-borne diseases. PMID:23024750
Accelerating cross-validation with total variation and its application to super-resolution imaging

NASA Astrophysics Data System (ADS)

Obuchi, Tomoyuki; Ikeda, Shiro; Akiyama, Kazunori; Kabashima, Yoshiyuki

2017-12-01

We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by ℓ_1-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision.
DOLPHIn—Dictionary Learning for Phase Retrieval

NASA Astrophysics Data System (ADS)

Tillmann, Andreas M.; Eldar, Yonina C.; Mairal, Julien

2016-12-01

We propose a new algorithm to learn a dictionary for reconstructing and sparsely encoding signals from measurements without phase. Specifically, we consider the task of estimating a two-dimensional image from squared-magnitude measurements of a complex-valued linear transformation of the original image. Several recent phase retrieval algorithms exploit underlying sparsity of the unknown signal in order to improve recovery performance. In this work, we consider such a sparse signal prior in the context of phase retrieval, when the sparsifying dictionary is not known in advance. Our algorithm jointly reconstructs the unknown signal - possibly corrupted by noise - and learns a dictionary such that each patch of the estimated image can be sparsely represented. Numerical experiments demonstrate that our approach can obtain significantly better reconstructions for phase retrieval problems with noise than methods that cannot exploit such "hidden" sparsity. Moreover, on the theoretical side, we provide a convergence result for our method.
Rectified factor networks for biclustering of omics data.

PubMed

Clevert, Djork-Arné; Unterthiner, Thomas; Povysil, Gundula; Hochreiter, Sepp

2017-07-15

Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster. On 400 benchmark datasets and on three gene expression datasets with known clusters, RFN outperformed 13 other biclustering methods including FABIA. On data of the 1000 Genomes Project, RFN could identify DNA segments which indicate, that interbreeding with other hominins starting already before ancestors of modern humans left Africa. https://github.com/bioinf-jku/librfn. djork-arne.clevert@bayer.com or hochreit@bioinf.jku.at. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Improving subjective pattern recognition in chemical senses through reduction of nonlinear effects in evaluation of sparse data

NASA Astrophysics Data System (ADS)

Assadi, Amir H.; Rasouli, Firooz; Wrenn, Susan E.; Subbiah, M.

2002-11-01

Artificial neural network models are typically useful in pattern recognition and extraction of important features in large data sets. These models are implemented in a wide variety of contexts and with diverse type of input-output data. The underlying mathematics of supervised training of neural networks is ultimately tied to the ability to approximate the nonlinearities that are inherent in network"s generalization ability. The quality and availability of sufficient data points for training and validation play a key role in the generalization ability of the network. A potential domain of applications of neural networks is in analysis of subjective data, such as in consumer science, affective neuroscience and perception of chemical senses. In applications of ANN to subjective data, it is common to rely on knowledge of the science and context for data acquisition, for instance as a priori probabilities in the Bayesian framework. In this paper, we discuss the circumstances that create challenges for success of neural network models for subjective data analysis, such as sparseness of data and cost of acquisition of additional samples. In particular, in the case of affect and perception of chemical senses, we suggest that inherent ambiguity of subjective responses could be offset by a combination of human-machine expert. We propose a method of pre- and post-processing for blind analysis of data that that relies on heuristics from human performance in interpretation of data. In particular, we offer an information-theoretic smoothing (ITS) algorithm that optimizes that geometric visualization of multi-dimensional data and improves human interpretation of the input-output view of neural network implementations. The pre- and post-processing algorithms and ITS are unsupervised. Finally, we discuss the details of an example of blind data analysis from actual taste-smell subjective data, and demonstrate the usefulness of PCA in reduction of dimensionality, as well as ITS.
Three-Dimensional Stratification of Bacterial Biofilm Populations in a Moving Bed Biofilm Reactor for Nitritation-Anammox

PubMed Central

Almstrand, Robert; Persson, Frank; Daims, Holger; Ekenberg, Maria; Christensson, Magnus; Wilén, Britt-Marie; Sörensson, Fred; Hermansson, Malte

2014-01-01

Moving bed biofilm reactors (MBBRs) are increasingly used for nitrogen removal with nitritation-anaerobic ammonium oxidation (anammox) processes in wastewater treatment. Carriers provide protected surfaces where ammonia oxidizing bacteria (AOB) and anammox bacteria form complex biofilms. However, the knowledge about the organization of microbial communities in MBBR biofilms is sparse. We used new cryosectioning and imaging methods for fluorescence in situ hybridization (FISH) to study the structure of biofilms retrieved from carriers in a nitritation-anammox MBBR. The dimensions of the carrier compartments and the biofilm cryosections after FISH showed good correlation, indicating little disturbance of biofilm samples by the treatment. FISH showed that Nitrosomonas europaea/eutropha-related cells dominated the AOB and Candidatus Brocadia fulgida-related cells dominated the anammox guild. New carriers were initially colonized by AOB, followed by anammox bacteria proliferating in the deeper biofilm layers, probably in anaerobic microhabitats created by AOB activity. Mature biofilms showed a pronounced three-dimensional stratification where AOB dominated closer to the biofilm-water interface, whereas anammox were dominant deeper into the carrier space and towards the walls. Our results suggest that current mathematical models may be oversimplifying these three-dimensional systems and unless the multidimensionality of these systems is considered, models may result in suboptimal design of MBBR carriers. PMID:24481066
Optimal Couple Projections for Domain Adaptive Sparse Representation-based Classification.

PubMed

Zhang, Guoqing; Sun, Huaijiang; Porikli, Fatih; Liu, Yazhou; Sun, Quansen

2017-08-29

In recent years, sparse representation based classification (SRC) is one of the most successful methods and has been shown impressive performance in various classification tasks. However, when the training data has a different distribution than the testing data, the learned sparse representation may not be optimal, and the performance of SRC will be degraded significantly. To address this problem, in this paper, we propose an optimal couple projections for domain-adaptive sparse representation-based classification (OCPD-SRC) method, in which the discriminative features of data in the two domains are simultaneously learned with the dictionary that can succinctly represent the training and testing data in the projected space. OCPD-SRC is designed based on the decision rule of SRC, with the objective to learn coupled projection matrices and a common discriminative dictionary such that the between-class sparse reconstruction residuals of data from both domains are maximized, and the within-class sparse reconstruction residuals of data are minimized in the projected low-dimensional space. Thus, the resulting representations can well fit SRC and simultaneously have a better discriminant ability. In addition, our method can be easily extended to multiple domains and can be kernelized to deal with the nonlinear structure of data. The optimal solution for the proposed method can be efficiently obtained following the alternative optimization method. Extensive experimental results on a series of benchmark databases show that our method is better or comparable to many state-of-the-art methods.
Estimation of signal-dependent noise level function in transform domain via a sparse recovery model.

PubMed

Yang, Jingyu; Gan, Ziqiao; Wu, Zhaoyang; Hou, Chunping

2015-05-01

This paper proposes a novel algorithm to estimate the noise level function (NLF) of signal-dependent noise (SDN) from a single image based on the sparse representation of NLFs. Noise level samples are estimated from the high-frequency discrete cosine transform (DCT) coefficients of nonlocal-grouped low-variation image patches. Then, an NLF recovery model based on the sparse representation of NLFs under a trained basis is constructed to recover NLF from the incomplete noise level samples. Confidence levels of the NLF samples are incorporated into the proposed model to promote reliable samples and weaken unreliable ones. We investigate the behavior of the estimation performance with respect to the block size, sampling rate, and confidence weighting. Simulation results on synthetic noisy images show that our method outperforms existing state-of-the-art schemes. The proposed method is evaluated on real noisy images captured by three types of commodity imaging devices, and shows consistently excellent SDN estimation performance. The estimated NLFs are incorporated into two well-known denoising schemes, nonlocal means and BM3D, and show significant improvements in denoising SDN-polluted images.
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication

DOE PAGES

Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...

2015-01-01

The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
Superresolving Black Hole Images with Full-Closure Sparse Modeling

NASA Astrophysics Data System (ADS)

Crowley, Chelsea; Akiyama, Kazunori; Fish, Vincent

2018-01-01

It is believed that almost all galaxies have black holes at their centers. Imaging a black hole is a primary objective to answer scientific questions relating to relativistic accretion and jet formation. The Event Horizon Telescope (EHT) is set to capture images of two nearby black holes, Sagittarius A* at the center of the Milky Way galaxy roughly 26,000 light years away and the other M87 which is in Virgo A, a large elliptical galaxy that is 50 million light years away. Sparse imaging techniques have shown great promise for reconstructing high-fidelity superresolved images of black holes from simulated data. Previous work has included the effects of atmospheric phase errors and thermal noise, but not systematic amplitude errors that arise due to miscalibration. We explore a full-closure imaging technique with sparse modeling that uses closure amplitudes and closure phases to improve the imaging process. This new technique can successfully handle data with systematic amplitude errors. Applying our technique to synthetic EHT data of M87, we find that full-closure sparse modeling can reconstruct images better than traditional methods and recover key structural information on the source, such as the shape and size of the predicted photon ring. These results suggest that our new approach will provide superior imaging performance for data from the EHT and other interferometric arrays.
Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning.

PubMed

Gorban, A N; Mirkes, E M; Zinovyev, A

2016-12-01

Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0
Minimizers with Bounded Action for the High-Dimensional Frenkel-Kontorova Model

NASA Astrophysics Data System (ADS)

Miao, Xue-Qing; Wang, Ya-Nan; Qin, Wen-Xin

In Aubry-Mather theory for monotone twist maps or for one-dimensional Frenkel-Kontorova (FK) model with nearest neighbor interactions, each global minimizer (minimal energy configuration) is naturally Birkhoff. However, this is not true for the one-dimensional FK model with non-nearest neighbor interactions or for the high-dimensional FK model. In this paper, we study the Birkhoff property of minimizers with bounded action for the high-dimensional FK model.
Investigating the Performance of One- and Two-dimensional Flood Models in a Channelized River Network: A Case Study of the Obion River System

NASA Astrophysics Data System (ADS)

Kalyanapu, A. J.; Dullo, T. T.; Thornton, J. C.; Auld, L. A.

2015-12-01

Obion River, is located in the northwestern Tennessee region, and discharges into the Mississippi River. In the past, the river system was largely channelized for agricultural purposes that resulted in increased erosion, loss of wildlife habitat and downstream flood risks. These impacts are now being slowly reversed mainly due to wetland restoration. The river system is characterized by a large network of "loops" around the main channels that hold water either from excess flows or due to flow diversions. Without data on each individual channel, levee, canal, or pond it is not known where the water flows from or to. In some segments along the river, the natural channel has been altered and rerouted by the farmers for their irrigation purposes. Satellite imagery can aid in identifying these features, but its spatial coverage is temporally sparse. All the alterations that have been done to the watershed make it difficult to develop hydraulic models, which could predict flooding and droughts. This is especially true when building one-dimensional (1D) hydraulic models compared to two-dimensional (2D) models, as the former cannot adequately simulate lateral flows in the floodplain and in complex terrains. The objective of this study therefore is to study the performance of 1D and 2D flood models in this complex river system, evaluate the limitations of 1D models and highlight the advantages of 2D models. The study presents the application of HEC-RAS and HEC-2D models developed by the Hydrologic Engineering Center (HEC), a division of the US Army Corps of Engineers. The broader impacts of this study is the development of best practices for developing flood models in channelized river systems and in agricultural watersheds.
Comparison of an algebraic multigrid algorithm to two iterative solvers used for modeling ground water flow and transport

USGS Publications Warehouse

Detwiler, R.L.; Mehl, S.; Rajaram, H.; Cheung, W.W.

2002-01-01

Numerical solution of large-scale ground water flow and transport problems is often constrained by the convergence behavior of the iterative solvers used to solve the resulting systems of equations. We demonstrate the ability of an algebraic multigrid algorithm (AMG) to efficiently solve the large, sparse systems of equations that result from computational models of ground water flow and transport in large and complex domains. Unlike geometric multigrid methods, this algorithm is applicable to problems in complex flow geometries, such as those encountered in pore-scale modeling of two-phase flow and transport. We integrated AMG into MODFLOW 2000 to compare two- and three-dimensional flow simulations using AMG to simulations using PCG2, a preconditioned conjugate gradient solver that uses the modified incomplete Cholesky preconditioner and is included with MODFLOW 2000. CPU times required for convergence with AMG were up to 140 times faster than those for PCG2. The cost of this increased speed was up to a nine-fold increase in required random access memory (RAM) for the three-dimensional problems and up to a four-fold increase in required RAM for the two-dimensional problems. We also compared two-dimensional numerical simulations of steady-state transport using AMG and the generalized minimum residual method with an incomplete LU-decomposition preconditioner. For these transport simulations, AMG yielded increased speeds of up to 17 times with only a 20% increase in required RAM. The ability of AMG to solve flow and transport problems in large, complex flow systems and its ready availability make it an ideal solver for use in both field-scale and pore-scale modeling.

Digital Correlation In Laser-Speckle Velocimetry

NASA Technical Reports Server (NTRS)

Gilbert, John A.; Mathys, Donald R.

1992-01-01

Periodic recording helps to eliminate spurious results. Improved digital-correlation process extracts velocity field of two-dimensional flow from laser-speckle images of seed particles distributed sparsely in flow. Method which involves digital correlation of images recorded at unequal intervals, completely automated and has potential to be fastest yet.
Blind compressed sensing image reconstruction based on alternating direction method

NASA Astrophysics Data System (ADS)

Liu, Qinan; Guo, Shuxu

2018-04-01

In order to solve the problem of how to reconstruct the original image under the condition of unknown sparse basis, this paper proposes an image reconstruction method based on blind compressed sensing model. In this model, the image signal is regarded as the product of a sparse coefficient matrix and a dictionary matrix. Based on the existing blind compressed sensing theory, the optimal solution is solved by the alternative minimization method. The proposed method solves the problem that the sparse basis in compressed sensing is difficult to represent, which restrains the noise and improves the quality of reconstructed image. This method ensures that the blind compressed sensing theory has a unique solution and can recover the reconstructed original image signal from a complex environment with a stronger self-adaptability. The experimental results show that the image reconstruction algorithm based on blind compressed sensing proposed in this paper can recover high quality image signals under the condition of under-sampling.
Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.

PubMed

Ternès, Nils; Rotolo, Federico; Michiels, Stefan

2016-07-10

Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
The terminal area simulation system. Volume 2: Verification cases

NASA Technical Reports Server (NTRS)

Proctor, F. H.

1987-01-01

The numerical simulation of five case studies are presented and are compared with available data in order to verify the three-dimensional version of the Terminal Area Simulation System (TASS). A spectrum of convective storm types are selected for the case studies. Included are: a High-Plains supercell hailstorm, a small and relatively short-lived High-Plains cumulonimbus, a convective storm which produced the 2 August 1985 DFW microburst, a South Florida convective complex, and a tornadic Oklahoma thunderstorm. For each of the cases the model results compared reasonably well with observed data. In the simulations of the supercell storms many of their characteristic features were modeled, such as the hook echo, BWER, mesocyclone, gust fronts, giant persistent updraft, wall cloud, flanking-line towers, anvil and radar reflectivity overhang, and rightward veering in the storm propagation. In the simulation of the tornadic storm a horseshoe-shaped updraft configuration and cyclic changes in storm intensity and structure were noted. The simulation of the DFW microburst agreed remarkably well with sparse observed data. The simulated outflow rapidly expanded in a nearly symmetrical pattern and was associated with a ringvortex. A South Florida convective complex was simulated and contained updrafts and downdrafts in the form of discrete bubbles. The numerical simulations, in all cases, always remained stable and bounded with no anomalous trends.
Sparse regularization for force identification using dictionaries

NASA Astrophysics Data System (ADS)

Qiao, Baijie; Zhang, Xingwu; Wang, Chenxi; Zhang, Hang; Chen, Xuefeng

2016-04-01

The classical function expansion method based on minimizing l2-norm of the response residual employs various basis functions to represent the unknown force. Its difficulty lies in determining the optimum number of basis functions. Considering the sparsity of force in the time domain or in other basis space, we develop a general sparse regularization method based on minimizing l1-norm of the coefficient vector of basis functions. The number of basis functions is adaptively determined by minimizing the number of nonzero components in the coefficient vector during the sparse regularization process. First, according to the profile of the unknown force, the dictionary composed of basis functions is determined. Second, a sparsity convex optimization model for force identification is constructed. Third, given the transfer function and the operational response, Sparse reconstruction by separable approximation (SpaRSA) is developed to solve the sparse regularization problem of force identification. Finally, experiments including identification of impact and harmonic forces are conducted on a cantilever thin plate structure to illustrate the effectiveness and applicability of SpaRSA. Besides the Dirac dictionary, other three sparse dictionaries including Db6 wavelets, Sym4 wavelets and cubic B-spline functions can also accurately identify both the single and double impact forces from highly noisy responses in a sparse representation frame. The discrete cosine functions can also successfully reconstruct the harmonic forces including the sinusoidal, square and triangular forces. Conversely, the traditional Tikhonov regularization method with the L-curve criterion fails to identify both the impact and harmonic forces in these cases.
Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations

DOE PAGES

Plantenga, Todd; Kolda, Tamara G.; Hansen, Samantha

2015-04-30

Tensor factorizations with nonnegativity constraints have found application in analysing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g. count data), which leads to sparse tensors that can be modelled by sparse factor matrices. In this paper, we investigate efficient techniques for computing an appropriate canonical polyadic tensor factorization based on the Kullback–Leibler divergence function. We propose novel subproblem solvers within the standard alternating block variable approach. Our new methods exploit structure and reformulate the optimization problem as small independent subproblems. We employ bound-constrained Newton andmore » quasi-Newton methods. Finally, we compare our algorithms against other codes, demonstrating superior speed for high accuracy results and the ability to quickly find sparse solutions.« less
An adaptive sparse deconvolution method for distinguishing the overlapping echoes of ultrasonic guided waves for pipeline crack inspection

NASA Astrophysics Data System (ADS)

Chang, Yong; Zi, Yanyang; Zhao, Jiyuan; Yang, Zhe; He, Wangpeng; Sun, Hailiang

2017-03-01

In guided wave pipeline inspection, echoes reflected from closely spaced reflectors generally overlap, meaning useful information is lost. To solve the overlapping problem, sparse deconvolution methods have been developed in the past decade. However, conventional sparse deconvolution methods have limitations in handling guided wave signals, because the input signal is directly used as the prototype of the convolution matrix, without considering the waveform change caused by the dispersion properties of the guided wave. In this paper, an adaptive sparse deconvolution (ASD) method is proposed to overcome these limitations. First, the Gaussian echo model is employed to adaptively estimate the column prototype of the convolution matrix instead of directly using the input signal as the prototype. Then, the convolution matrix is constructed upon the estimated results. Third, the split augmented Lagrangian shrinkage (SALSA) algorithm is introduced to solve the deconvolution problem with high computational efficiency. To verify the effectiveness of the proposed method, guided wave signals obtained from pipeline inspection are investigated numerically and experimentally. Compared to conventional sparse deconvolution methods, e.g. the {{l}1} -norm deconvolution method, the proposed method shows better performance in handling the echo overlap problem in the guided wave signal.
Microstructure Images Restoration of Metallic Materials Based upon KSVD and Smoothing Penalty Sparse Representation Approach.

PubMed

Li, Qing; Liang, Steven Y

2018-04-20

Microstructure images of metallic materials play a significant role in industrial applications. To address image degradation problem of metallic materials, a novel image restoration technique based on K-means singular value decomposition (KSVD) and smoothing penalty sparse representation (SPSR) algorithm is proposed in this work, the microstructure images of aluminum alloy 7075 (AA7075) material are used as examples. To begin with, to reflect the detail structure characteristics of the damaged image, the KSVD dictionary is introduced to substitute the traditional sparse transform basis (TSTB) for sparse representation. Then, due to the image restoration, modeling belongs to a highly underdetermined equation, and traditional sparse reconstruction methods may cause instability and obvious artifacts in the reconstructed images, especially reconstructed image with many smooth regions and the noise level is strong, thus the SPSR (here, q = 0.5) algorithm is designed to reconstruct the damaged image. The results of simulation and two practical cases demonstrate that the proposed method has superior performance compared with some state-of-the-art methods in terms of restoration performance factors and visual quality. Meanwhile, the grain size parameters and grain boundaries of microstructure image are discussed before and after they are restored by proposed method.
Medical image classification based on multi-scale non-negative sparse coding.

PubMed

Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar

2017-11-01

With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Modeling of 2D diffusion processes based on microscopy data: parameter estimation and practical identifiability analysis.

PubMed

Hock, Sabrina; Hasenauer, Jan; Theis, Fabian J

2013-01-01

Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters. We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes. As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
Application of high performance computing for studying cyclic variability in dilute internal combustion engines

DOE Office of Scientific and Technical Information (OSTI.GOV)

FINNEY, Charles E A; Edwards, Kevin Dean; Stoyanov, Miroslav K

2015-01-01

Combustion instabilities in dilute internal combustion engines are manifest in cyclic variability (CV) in engine performance measures such as integrated heat release or shaft work. Understanding the factors leading to CV is important in model-based control, especially with high dilution where experimental studies have demonstrated that deterministic effects can become more prominent. Observation of enough consecutive engine cycles for significant statistical analysis is standard in experimental studies but is largely wanting in numerical simulations because of the computational time required to compute hundreds or thousands of consecutive cycles. We have proposed and begun implementation of an alternative approach to allowmore » rapid simulation of long series of engine dynamics based on a low-dimensional mapping of ensembles of single-cycle simulations which map input parameters to output engine performance. This paper details the use Titan at the Oak Ridge Leadership Computing Facility to investigate CV in a gasoline direct-injected spark-ignited engine with a moderately high rate of dilution achieved through external exhaust gas recirculation. The CONVERGE CFD software was used to perform single-cycle simulations with imposed variations of operating parameters and boundary conditions selected according to a sparse grid sampling of the parameter space. Using an uncertainty quantification technique, the sampling scheme is chosen similar to a design of experiments grid but uses functions designed to minimize the number of samples required to achieve a desired degree of accuracy. The simulations map input parameters to output metrics of engine performance for a single cycle, and by mapping over a large parameter space, results can be interpolated from within that space. This interpolation scheme forms the basis for a low-dimensional metamodel which can be used to mimic the dynamical behavior of corresponding high-dimensional simulations. Simulations of high-EGR spark-ignition combustion cycles within a parametric sampling grid were performed and analyzed statistically, and sensitivities of the physical factors leading to high CV are presented. With these results, the prospect of producing low-dimensional metamodels to describe engine dynamics at any point in the parameter space will be discussed. Additionally, modifications to the methodology to account for nondeterministic effects in the numerical solution environment are proposed« less
Coulomb disorder in three-dimensional Dirac materials

NASA Astrophysics Data System (ADS)

Skinner, Brian

2015-03-01

In three-dimensional materials with a Dirac spectrum, weak short-ranged disorder is essentially irrelevant near the Dirac point. This is manifestly not the case for Coulomb disorder, where the long-ranged nature of the potential produced by charged impurities implies large fluctuations of the disorder potential even when impurities are sparse, and these fluctuations are screened by the formation of electron/hole puddles. Here I outline a theory of such nonlinear screening of Coulomb disorder in three-dimensional Dirac systems, and present results for the typical magnitude of the disorder potential, the corresponding density of states, and the size and density of electron/hole puddles. The resulting conductivity is also discussed.
Tensor Dictionary Learning for Positive Definite Matrices.

PubMed

Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos

2015-11-01

Sparse models have proven to be extremely successful in image processing and computer vision. However, a majority of the effort has been focused on sparse representation of vectors and low-rank models for general matrices. The success of sparse modeling, along with popularity of region covariances, has inspired the development of sparse coding approaches for these positive definite descriptors. While in earlier work, the dictionary was formed from all, or a random subset of, the training signals, it is clearly advantageous to learn a concise dictionary from the entire training set. In this paper, we propose a novel approach for dictionary learning over positive definite matrices. The dictionary is learned by alternating minimization between sparse coding and dictionary update stages, and different atom update methods are described. A discriminative version of the dictionary learning approach is also proposed, which simultaneously learns dictionaries for different classes in classification or clustering. Experimental results demonstrate the advantage of learning dictionaries from data both from reconstruction and classification viewpoints. Finally, a software library is presented comprising C++ binaries for all the positive definite sparse coding and dictionary learning approaches presented here.
Simulating land surface energy fluxes using a microscopic root water uptake approach in a northern temperate forest

NASA Astrophysics Data System (ADS)

He, L.; Ivanov, V. Y.; Schneider, C.

2012-12-01

The predictive accuracy of current land surface models has been limited by uncertainties in modeling transpiration and its sensitivity to the plant-available water in the root zone. Models usually distribute vegetation transpiration demand as sink terms in one-dimensional soil-water accounting model, according to the vertical root density profile. During water-limited situations, the sink terms are constrained using a heuristic "Feddes-type" water stress function. This approach significantly simplifies the actual three-dimensional physical process of root water uptake and may predict an early onset of water-limited transpiration. Recently, a microscopic root water uptake approach was proposed to simulate the three-dimensional radial moisture fluxes from the soil to roots, and water flux transfer processes along the root systems. During dry conditions, this approach permits the compensation of decreased root water uptake in water-stressed regions by increasing uptake density in moister regions. This effect cannot be captured by the Feddes heuristic function. This study "loosely" incorporates the microscopic root water uptake approach based on aRoot model into an ecohydrological model tRIBS+VEGGIE. The ecohydrological model provides boundary conditions for the microscopic root water uptake model (e.g., potential transpiration, soil evaporation, and precipitation influx), and the latter computes the actual transpiration and profiles of sink terms. Based on the departure of the actual latent heat flux from the potential value, the other energy budget components are adjusted. The study is conducted for a northern temperate mixed forest near the University of Michigan Biological Station. Observational evidence for this site suggests little-to-no control of transpiration by soil moisture yet the commonly used Feddes-type approach implies severe water limitation on transpiration during dry episodes. The study addresses two species: oak and aspen. The effects of differences in root architecture on actual transpiration are explored. The energy components simulated with the microscopic modeling approach are tested against observational data. Through the improved spatiotemporal representation of small-scale root water uptake process, the microscopic modeling framework leads to a better agreement with the observational data than the Feddes-type approach. During dry periods, relatively high transpiration is sustained, as water uptake regions shift from densely to sparsely rooted layers, or from drier to moister soil areas. Implications and approaches for incorporating microscopic modeling methodologies within large-scale land-surface parameterizations are discussed.
Single shot, three-dimensional fluorescence microscopy with a spatially rotating point spread function

PubMed Central

Wang, Zhaojun; Cai, Yanan; Liang, Yansheng; Zhou, Xing; Yan, Shaohui; Dan, Dan; Bianco, Piero R.; Lei, Ming; Yao, Baoli

2017-01-01

A wide-field fluorescence microscope with a double-helix point spread function (PSF) is constructed to obtain the specimen’s three-dimensional distribution with a single snapshot. Spiral-phase-based computer-generated holograms (CGHs) are adopted to make the depth-of-field of the microscope adjustable. The impact of system aberrations on the double-helix PSF at high numerical aperture is analyzed to reveal the necessity of the aberration correction. A modified cepstrum-based reconstruction scheme is promoted in accordance with properties of the new double-helix PSF. The extended depth-of-field images and the corresponding depth maps for both a simulated sample and a tilted section slice of bovine pulmonary artery endothelial (BPAE) cells are recovered, respectively, verifying that the depth-of-field is properly extended and the depth of the specimen can be estimated at a precision of 23.4nm. This three-dimensional fluorescence microscope with a framerate-rank time resolution is suitable for studying the fast developing process of thin and sparsely distributed micron-scale cells in extended depth-of-field. PMID:29296483
Two-dimensional antimonene single crystals grown by van der Waals epitaxy.

PubMed

Ji, Jianping; Song, Xiufeng; Liu, Jizi; Yan, Zhong; Huo, Chengxue; Zhang, Shengli; Su, Meng; Liao, Lei; Wang, Wenhui; Ni, Zhenhua; Hao, Yufeng; Zeng, Haibo

2016-11-15

Unlike the unstable black phosphorous, another two-dimensional group-VA material, antimonene, was recently predicted to exhibit good stability and remarkable physical properties. However, the synthesis of high-quality monolayer or few-layer antimonenes, sparsely reported, has greatly hindered the development of this new field. Here, we report the van der Waals epitaxy growth of few-layer antimonene monocrystalline polygons, their atomical microstructure and stability in ambient condition. The high-quality, few-layer antimonene monocrystalline polygons can be synthesized on various substrates, including flexible ones, via van der Waals epitaxy growth. Raman spectroscopy and transmission electron microscopy reveal that the obtained antimonene polygons have buckled rhombohedral atomic structure, consistent with the theoretically predicted most stable β-phase allotrope. The very high stability of antimonenes was observed after aging in air for 30 days. First-principle and molecular dynamics simulation results confirmed that compared with phosphorene, antimonene is less likely to be oxidized and possesses higher thermodynamic stability in oxygen atmosphere at room temperature. Moreover, antimonene polygons show high electrical conductivity up to 10 4 S m -1 and good optical transparency in the visible light range, promising in transparent conductive electrode applications.
Two-dimensional antimonene single crystals grown by van der Waals epitaxy

PubMed Central

Ji, Jianping; Song, Xiufeng; Liu, Jizi; Yan, Zhong; Huo, Chengxue; Zhang, Shengli; Su, Meng; Liao, Lei; Wang, Wenhui; Ni, Zhenhua; Hao, Yufeng; Zeng, Haibo

2016-01-01

Unlike the unstable black phosphorous, another two-dimensional group-VA material, antimonene, was recently predicted to exhibit good stability and remarkable physical properties. However, the synthesis of high-quality monolayer or few-layer antimonenes, sparsely reported, has greatly hindered the development of this new field. Here, we report the van der Waals epitaxy growth of few-layer antimonene monocrystalline polygons, their atomical microstructure and stability in ambient condition. The high-quality, few-layer antimonene monocrystalline polygons can be synthesized on various substrates, including flexible ones, via van der Waals epitaxy growth. Raman spectroscopy and transmission electron microscopy reveal that the obtained antimonene polygons have buckled rhombohedral atomic structure, consistent with the theoretically predicted most stable β-phase allotrope. The very high stability of antimonenes was observed after aging in air for 30 days. First-principle and molecular dynamics simulation results confirmed that compared with phosphorene, antimonene is less likely to be oxidized and possesses higher thermodynamic stability in oxygen atmosphere at room temperature. Moreover, antimonene polygons show high electrical conductivity up to 104 S m−1 and good optical transparency in the visible light range, promising in transparent conductive electrode applications. PMID:27845327
Relaxations to Sparse Optimization Problems and Applications

NASA Astrophysics Data System (ADS)

Skau, Erik West

Parsimony is a fundamental property that is applied to many characteristics in a variety of fields. Of particular interest are optimization problems that apply rank, dimensionality, or support in a parsimonious manner. In this thesis we study some optimization problems and their relaxations, and focus on properties and qualities of the solutions of these problems. The Gramian tensor decomposition problem attempts to decompose a symmetric tensor as a sum of rank one tensors.We approach the Gramian tensor decomposition problem with a relaxation to a semidefinite program. We study conditions which ensure that the solution of the relaxed semidefinite problem gives the minimal Gramian rank decomposition. Sparse representations with learned dictionaries are one of the leading image modeling techniques for image restoration. When learning these dictionaries from a set of training images, the sparsity parameter of the dictionary learning algorithm strongly influences the content of the dictionary atoms.We describe geometrically the content of trained dictionaries and how it changes with the sparsity parameter.We use statistical analysis to characterize how the different content is used in sparse representations. Finally, a method to control the structure of the dictionaries is demonstrated, allowing us to learn a dictionary which can later be tailored for specific applications. Variations of dictionary learning can be broadly applied to a variety of applications.We explore a pansharpening problem with a triple factorization variant of coupled dictionary learning. Another application of dictionary learning is computer vision. Computer vision relies heavily on object detection, which we explore with a hierarchical convolutional dictionary learning model. Data fusion of disparate modalities is a growing topic of interest.We do a case study to demonstrate the benefit of using social media data with satellite imagery to estimate hazard extents. In this case study analysis we apply a maximum entropy model, guided by the social media data, to estimate the flooded regions during a 2013 flood in Boulder, CO and show that the results are comparable to those obtained using expert information.
Towards seismic waveform inversion of long-offset Ocean-Bottom Seismic data for deep crustal imaging offshore Western Australia

NASA Astrophysics Data System (ADS)

Monnier, S.; Lumley, D. E.; Kamei, R.; Goncharov, A.; Shragge, J. C.

2016-12-01

Ocean Bottom Seismic datasets have become increasingly used in recent years to develop high-resolution, wavelength-scale P-wave velocity models of the lithosphere from waveform inversion, due to their recording of long-offset transmitted phases. New OBS surveys evolve towards novel acquisition geometries involving longer offsets (several hundreds of km), broader frequency content (1-100 Hz), while receiver sampling often remains sparse (several km). Therefore, it is critical to assess the effects of such geometries on the eventual success and resolution of waveform inversion velocity models. In this study, we investigate the feasibility of waveform inversion on the Bart 2D OBS profile acquired offshore Western Australia, to investigate regional crustal and Moho structures. The dataset features 14 broadband seismometers (0.01-100 Hz) from AuScope's national OBS fleet, offsets in excess of 280 km, and a sparse receiver sampling (18 km). We perform our analysis in four stages: (1) field data analysis, (2) 2D P-wave velocity model building, synthetic data (3) modelling, and (4) waveform inversion. Data exploration shows high-quality active-source signal down to 2Hz, and usable first arrivals to offsets greater than 100 km. The background velocity model is constructed by combining crustal and Moho information in continental reference models (e.g., AuSREM, AusMoho). These low-resolution studies suggest a crustal thickness of 20-25 km along our seismic line and constitute a starting point for synthetic modelling and inversion. We perform synthetic 2D time-domain modelling to: (1) evaluate the misfit between synthetic and field data within the usable frequency band (2-10 Hz); (2) validate our velocity model; and (3) observe the effects of sparse OBS interval on data quality. Finally, we apply 2D acoustic frequency-domain waveform inversion to the synthetic data to generate velocity model updates. The inverted model is compared to the reference model to investigate the improved crustal resolution and Moho boundary delineation that could be realized using waveform inversion, and to evaluate the effects of the acquisition parameters. The inversion strategies developed through the synthetic tests will help the subsequent inversion of sparse, long-offset OBS field data.
Energy Balanced Strategies for Maximizing the Lifetime of Sparsely Deployed Underwater Acoustic Sensor Networks

PubMed Central

Luo, Hanjiang; Guo, Zhongwen; Wu, Kaishun; Hong, Feng; Feng, Yuan

2009-01-01

Underwater acoustic sensor networks (UWA-SNs) are envisioned to perform monitoring tasks over the large portion of the world covered by oceans. Due to economics and the large area of the ocean, UWA-SNs are mainly sparsely deployed networks nowadays. The limited battery resources is a big challenge for the deployment of such long-term sensor networks. Unbalanced battery energy consumption will lead to early energy depletion of nodes, which partitions the whole networks and impairs the integrity of the monitoring datasets or even results in the collapse of the entire networks. On the contrary, balanced energy dissipation of nodes can prolong the lifetime of such networks. In this paper, we focus on the energy balance dissipation problem of two types of sparsely deployed UWA-SNs: underwater moored monitoring systems and sparsely deployed two-dimensional UWA-SNs. We first analyze the reasons of unbalanced energy consumption in such networks, then we propose two energy balanced strategies to maximize the lifetime of networks both in shallow and deep water. Finally, we evaluate our methods by simulations and the results show that the two strategies can achieve balanced energy consumption per node while at the same time prolong the networks lifetime. PMID:22399970

Automatic Reconstruction of Spacecraft 3D Shape from Imagery

NASA Astrophysics Data System (ADS)

Poelman, C.; Radtke, R.; Voorhees, H.

We describe a system that computes the three-dimensional (3D) shape of a spacecraft from a sequence of uncalibrated, two-dimensional images. While the mathematics of multi-view geometry is well understood, building a system that accurately recovers 3D shape from real imagery remains an art. A novel aspect of our approach is the combination of algorithms from computer vision, photogrammetry, and computer graphics. We demonstrate our system by computing spacecraft models from imagery taken by the Air Force Research Laboratory's XSS-10 satellite and DARPA's Orbital Express satellite. Using feature tie points (each identified in two or more images), we compute the relative motion of each frame and the 3D location of each feature using iterative linear factorization followed by non-linear bundle adjustment. The "point cloud" that results from this traditional shape-from-motion approach is typically too sparse to generate a detailed 3D model. Therefore, we use the computed motion solution as input to a volumetric silhouette-carving algorithm, which constructs a solid 3D model based on viewpoint consistency with the image frames. The resulting voxel model is then converted to a facet-based surface representation and is texture-mapped, yielding realistic images from arbitrary viewpoints. We also illustrate other applications of the algorithm, including 3D mensuration and stereoscopic 3D movie generation.
The effects of missing data on global ozone estimates

NASA Technical Reports Server (NTRS)

Drewry, J. W.; Robbins, J. L.

1981-01-01

The effects of missing data and model truncation on estimates of the global mean, zonal distribution, and global distribution of ozone are considered. It is shown that missing data can introduce biased estimates with errors that are not accounted for in the accuracy calculations of empirical modeling techniques. Data-fill techniques are introduced and used for evaluating error bounds and constraining the estimate in areas of sparse and missing data. It is found that the accuracy of the global mean estimate is more dependent on data distribution than model size. Zonal features can be accurately described by 7th order models over regions of adequate data distribution. Data variance accounted for by higher order models appears to represent climatological features of columnar ozone rather than pure error. Data-fill techniques can prevent artificial feature generation in regions of sparse or missing data without degrading high order estimates over dense data regions.
[Analysis of vegetation spatial and temporal variations in Qinghai Province based on remote sensing].

PubMed

Wang, Li-wen; Wei, Ya-xing; Niu, Zheng

2008-06-01

1 km MODIS NDVI time series data combining with decision tree classification, supervised classification and unsupervised classification was used to classify land cover type of Qinghai Province into 14 classes. In our classification system, sparse grassland and sparse shrub were emphasized, and their spatial distribution locations were labeled. From digital elevation model (DEM) of Qinghai Province, five elevation belts were achieved, and we utilized geographic information system (GIS) software to analyze vegetation cover variation on different elevation belts. Our research result shows that vegetation cover in Qinghai Province has been improved in recent five years. Vegetation cover area increases from 370047 km2 in 2001 to 374576 km2 in 2006, and vegetation cover rate increases by 0.63%. Among five grade elevation belts, vegetation cover ratio of high mountain belt is the highest (67.92%). The area of middle density grassland in high mountain belt is the largest, of which area is 94 003 km2. Increased area of dense grassland in high mountain belt is the greatest (1280 km2). During five years, the biggest variation is the conversion from sparse grassland to middle density grassland in high mountain belt, of which area is 15931 km2.
Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

NASA Astrophysics Data System (ADS)

Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

2016-10-01

Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

NASA Astrophysics Data System (ADS)

Briggs, J. P.; Pennycook, S. J.; Fergusson, J. R.; Jäykkä, J.; Shellard, E. P. S.

2016-04-01

We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain scenarios and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We demonstrate significant speed-ups of ≈100×, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3× and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38×. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Variational Bayesian Learning for Wavelet Independent Component Analysis

NASA Astrophysics Data System (ADS)

Roussos, E.; Roberts, S.; Daubechies, I.

2005-11-01

In an exploratory approach to data analysis, it is often useful to consider the observations as generated from a set of latent generators or "sources" via a generally unknown mapping. For the noisy overcomplete case, where we have more sources than observations, the problem becomes extremely ill-posed. Solutions to such inverse problems can, in many cases, be achieved by incorporating prior knowledge about the problem, captured in the form of constraints. This setting is a natural candidate for the application of the Bayesian methodology, allowing us to incorporate "soft" constraints in a natural manner. The work described in this paper is mainly driven by problems in functional magnetic resonance imaging of the brain, for the neuro-scientific goal of extracting relevant "maps" from the data. This can be stated as a `blind' source separation problem. Recent experiments in the field of neuroscience show that these maps are sparse, in some appropriate sense. The separation problem can be solved by independent component analysis (ICA), viewed as a technique for seeking sparse components, assuming appropriate distributions for the sources. We derive a hybrid wavelet-ICA model, transforming the signals into a domain where the modeling assumption of sparsity of the coefficients with respect to a dictionary is natural. We follow a graphical modeling formalism, viewing ICA as a probabilistic generative model. We use hierarchical source and mixing models and apply Bayesian inference to the problem. This allows us to perform model selection in order to infer the complexity of the representation, as well as automatic denoising. Since exact inference and learning in such a model is intractable, we follow a variational Bayesian mean-field approach in the conjugate-exponential family of distributions, for efficient unsupervised learning in multi-dimensional settings. The performance of the proposed algorithm is demonstrated on some representative experiments.
Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE

PubMed Central

Jamieson, Andrew R.; Giger, Maryellen L.; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008)]. Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier’s AUC performance. Results: In the large U.S. data set, sample high performance results include, AUC0.632+=0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+=0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+=0.90 with interval [0.847;0.919], all using the MCMC-BANN. Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space. PMID:20175497
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies

DOE PAGES

Maddali, S.; Calvo-Almazan, I.; Almer, J.; ...

2018-03-21

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this datamore » set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. Lastly, we use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.« less
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maddali, S.; Calvo-Almazan, I.; Almer, J.

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this datamore » set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. Lastly, we use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.« less
Sparse recovery of undersampled intensity patterns for coherent diffraction imaging at high X-ray energies.

PubMed

Maddali, S; Calvo-Almazan, I; Almer, J; Kenesei, P; Park, J-S; Harder, R; Nashed, Y; Hruszkewycz, S O

2018-03-21

Coherent X-ray photons with energies higher than 50 keV offer new possibilities for imaging nanoscale lattice distortions in bulk crystalline materials using Bragg peak phase retrieval methods. However, the compression of reciprocal space at high energies typically results in poorly resolved fringes on an area detector, rendering the diffraction data unsuitable for the three-dimensional reconstruction of compact crystals. To address this problem, we propose a method by which to recover fine fringe detail in the scattered intensity. This recovery is achieved in two steps: multiple undersampled measurements are made by in-plane sub-pixel motion of the area detector, then this data set is passed to a sparsity-based numerical solver that recovers fringe detail suitable for standard Bragg coherent diffraction imaging (BCDI) reconstruction methods of compact single crystals. The key insight of this paper is that sparsity in a BCDI data set can be enforced by recognising that the signal in the detector, though poorly resolved, is band-limited. This requires fewer in-plane detector translations for complete signal recovery, while adhering to information theory limits. We use simulated BCDI data sets to demonstrate the approach, outline our sparse recovery strategy, and comment on future opportunities.
Optical coherence tomography retinal image reconstruction via nonlocal weighted sparse representation

NASA Astrophysics Data System (ADS)

Abbasi, Ashkan; Monadjemi, Amirhassan; Fang, Leyuan; Rabbani, Hossein

2018-03-01

We present a nonlocal weighted sparse representation (NWSR) method for reconstruction of retinal optical coherence tomography (OCT) images. To reconstruct a high signal-to-noise ratio and high-resolution OCT images, utilization of efficient denoising and interpolation algorithms are necessary, especially when the original data were subsampled during acquisition. However, the OCT images suffer from the presence of a high level of noise, which makes the estimation of sparse representations a difficult task. Thus, the proposed NWSR method merges sparse representations of multiple similar noisy and denoised patches to better estimate a sparse representation for each patch. First, the sparse representation of each patch is independently computed over an overcomplete dictionary, and then a nonlocal weighted sparse coefficient is computed by averaging representations of similar patches. Since the sparsity can reveal relevant information from noisy patches, combining noisy and denoised patches' representations is beneficial to obtain a more robust estimate of the unknown sparse representation. The denoised patches are obtained by applying an off-the-shelf image denoising method and our method provides an efficient way to exploit information from noisy and denoised patches' representations. The experimental results on denoising and interpolation of spectral domain OCT images demonstrated the effectiveness of the proposed NWSR method over existing state-of-the-art methods.
On the feasibility of real-time mapping of the geoelectric field across North America

USGS Publications Warehouse

Love, Jeffrey J.; Rigler, E. Joshua; Kelbert, Anna; Finn, Carol A.; Bedrosian, Paul A.; Balch, Christopher C.

2018-06-08

A review is given of the present feasibility for accurately mapping geoelectric fields across North America in near-realtime by modeling geomagnetic monitoring and magnetotelluric survey data. Should this capability be successfully developed, it could inform utility companies of magnetic-storm interference on electric-power-grid systems. That real-time mapping of geoelectric fields is a challenge is reflective of (1) the spatiotemporal complexity of geomagnetic variation, especially during magnetic storms, (2) the sparse distribution of ground-based geomagnetic monitoring stations that report data in realtime, (3) the spatial complexity of three-dimensional solid-Earth impedance, and (4) the geographically incomplete state of continental-scale magnetotelluric surveys.
Disentangling giant component and finite cluster contributions in sparse random matrix spectra.

PubMed

Kühn, Reimer

2016-04-01

We describe a method for disentangling giant component and finite cluster contributions to sparse random matrix spectra, using sparse symmetric random matrices defined on Erdős-Rényi graphs as an example and test bed. Our methods apply to sparse matrices defined in terms of arbitrary graphs in the configuration model class, as long as they have finite mean degree.
Intermediate Scale Experimental Design to Validate a Subsurface Inverse Theory Applicable to Date-sparse Conditions

NASA Astrophysics Data System (ADS)

Jiao, J.; Trautz, A.; Zhang, Y.; Illangasekera, T.

2017-12-01

Subsurface flow and transport characterization under data-sparse condition is addressed by a new and computationally efficient inverse theory that simultaneously estimates parameters, state variables, and boundary conditions. Uncertainty in static data can be accounted for while parameter structure can be complex due to process uncertainty. The approach has been successfully extended to inverting transient and unsaturated flows as well as contaminant source identification under unknown initial and boundary conditions. In one example, by sampling numerical experiments simulating two-dimensional steady-state flow in which tracer migrates, a sequential inversion scheme first estimates the flow field and permeability structure before the evolution of tracer plume and dispersivities are jointly estimated. Compared to traditional inversion techniques, the theory does not use forward simulations to assess model-data misfits, thus the knowledge of the difficult-to-determine site boundary condition is not required. To test the general applicability of the theory, data generated during high-precision intermediate-scale experiments (i.e., a scale intermediary to the field and column scales) in large synthetic aquifers can be used. The design of such experiments is not trivial as laboratory conditions have to be selected to mimic natural systems in order to provide useful data, thus requiring a variety of sensors and data collection strategies. This paper presents the design of such an experiment in a synthetic, multi-layered aquifer with dimensions of 242.7 x 119.3 x 7.7 cm3. Different experimental scenarios that will generate data to validate the theory are presented.
GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media.

PubMed

Zhang, Chao; Zhang, Keyang; Yuan, Quan; Zhang, Luming; Hanratty, Tim; Han, Jiawei

2016-08-01

Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data ( e.g. , GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a group-level mobility modeling method using GeoSM data. Our insight is that the GeoSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GeoSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we find that GMove significantly outperforms baseline mobility models in terms of prediction accuracy.
GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media

PubMed Central

Zhang, Chao; Zhang, Keyang; Yuan, Quan; Zhang, Luming; Hanratty, Tim; Han, Jiawei

2017-01-01

Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (e.g., GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a group-level mobility modeling method using GeoSM data. Our insight is that the GeoSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GeoSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we find that GMove significantly outperforms baseline mobility models in terms of prediction accuracy. PMID:28163978
A one- and two-layer model for estimating evapotranspiration with remotely sensed surface temperature and ground-based meteorological data over partial canopy cover

NASA Technical Reports Server (NTRS)

Kustas, William P.; Choudhury, Bhaskar J.; Kunkel, Kenneth E.

1989-01-01

Surface-air temperature differences are commonly used in a bulk resistance equation for estimating sensible heat flux (H), which is inserted in the one-dimensional energy balance equation to solve for the latent heat flux (LE) as a residual. Serious discrepancies between estimated and measured LE have been observed for partial-canopy-cover conditions, which are mainly attributed to inappropriate estimates of H. To improve the estimates of H over sparse canopies, one- and two-layer resistance models that account for some of the factors causing poor agreement are developed. The utility of the two models is tested with remotely sensed and micrometeorological data for a furrowed cotton field with 20 percent cover and a dry soil surface. It is found that the one-layer model performs better than the two-layer model when a theoretical bluff-body correction for heat transfer is used instead of an empirical adjustment; otherwise, the two-layer model is better.
Sparse Zero-Sum Games as Stable Functional Feature Selection

PubMed Central

Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

2015-01-01

In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

A progress report on estuary modeling by the finite-element method

USGS Publications Warehouse

Gray, William G.

1978-01-01

Various schemes are investigated for finite-element modeling of two-dimensional surface-water flows. The first schemes investigated combine finite-element spatial discretization with split-step time stepping schemes that have been found useful in finite-difference computations. Because of the large number of numerical integrations performed in space and the large sparse matrices solved, these finite-element schemes were found to be economically uncompetitive with finite-difference schemes. A very promising leapfrog scheme is proposed which, when combined with a novel very fast spatial integration procedure, eliminates the need to solve any matrices at all. Additional problems attacked included proper propagation of waves and proper specification of the normal flow-boundary condition. This report indicates work in progress and does not come to a definitive conclusion as to the best approach for finite-element modeling of surface-water problems. The results presented represent findings obtained between September 1973 and July 1976. (Woodard-USGS)
Three-dimensional modeling, estimation, and fault diagnosis of spacecraft air contaminants.

PubMed

Narayan, A P; Ramirez, W F

1998-01-01

A description is given of the design and implementation of a method to track the presence of air contaminants aboard a spacecraft using an accurate physical model and of a procedure that would raise alarms when certain tolerance levels are exceeded. Because our objective is to monitor the contaminants in real time, we make use of a state estimation procedure that filters measurements from a sensor system and arrives at an optimal estimate of the state of the system. The model essentially consists of a convection-diffusion equation in three dimensions, solved implicitly using the principle of operator splitting, and uses a flowfield obtained by the solution of the Navier-Stokes equations for the cabin geometry, assuming steady-state conditions. A novel implicit Kalman filter has been used for fault detection, a procedure that is an efficient way to track the state of the system and that uses the sparse nature of the state transition matrices.
FragFit: a web-application for interactive modeling of protein segments into cryo-EM density maps.

PubMed

Tiemann, Johanna K S; Rose, Alexander S; Ismer, Jochen; Darvish, Mitra D; Hilal, Tarek; Spahn, Christian M T; Hildebrand, Peter W

2018-05-21

Cryo-electron microscopy (cryo-EM) is a standard method to determine the three-dimensional structures of molecular complexes. However, easy to use tools for modeling of protein segments into cryo-EM maps are sparse. Here, we present the FragFit web-application, a web server for interactive modeling of segments of up to 35 amino acids length into cryo-EM density maps. The fragments are provided by a regularly updated database containing at the moment about 1 billion entries extracted from PDB structures and can be readily integrated into a protein structure. Fragments are selected based on geometric criteria, sequence similarity and fit into a given cryo-EM density map. Web-based molecular visualization with the NGL Viewer allows interactive selection of fragments. The FragFit web-application, accessible at http://proteinformatics.de/FragFit, is free and open to all users, without any login requirements.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks.

PubMed

Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph

2015-08-01

Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is "non-intrusive" and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks

PubMed Central

Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph

2015-01-01

Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is “non-intrusive” and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design. PMID:26317784
LiDAR point classification based on sparse representation

NASA Astrophysics Data System (ADS)

Li, Nan; Pfeifer, Norbert; Liu, Chun

2017-04-01

In order to combine the initial spatial structure and features of LiDAR data for accurate classification. The LiDAR data is represented as a 4-order tensor. Sparse representation for classification(SRC) method is used for LiDAR tensor classification. It turns out SRC need only a few of training samples from each class, meanwhile can achieve good classification result. Multiple features are extracted from raw LiDAR points to generate a high-dimensional vector at each point. Then the LiDAR tensor is built by the spatial distribution and feature vectors of the point neighborhood. The entries of LiDAR tensor are accessed via four indexes. Each index is called mode: three spatial modes in direction X ,Y ,Z and one feature mode. Sparse representation for classification(SRC) method is proposed in this paper. The sparsity algorithm is to find the best represent the test sample by sparse linear combination of training samples from a dictionary. To explore the sparsity of LiDAR tensor, the tucker decomposition is used. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. Those matrices could be considered as the principal components in each mode. The entries of core tensor show the level of interaction between the different components. Therefore, the LiDAR tensor can be approximately represented by a sparse tensor multiplied by a matrix selected from a dictionary along each mode. The matrices decomposed from training samples are arranged as initial elements in the dictionary. By dictionary learning, a reconstructive and discriminative structure dictionary along each mode is built. The overall structure dictionary composes of class-specified sub-dictionaries. Then the sparse core tensor is calculated by tensor OMP(Orthogonal Matching Pursuit) method based on dictionaries along each mode. It is expected that original tensor should be well recovered by sub-dictionary associated with relevant class, while entries in the sparse tensor associated with other classed should be nearly zero. Therefore, SRC use the reconstruction error associated with each class to do data classification. A section of airborne LiDAR points of Vienna city is used and classified into 6classes: ground, roofs, vegetation, covered ground, walls and other points. Only 6 training samples from each class are taken. For the final classification result, ground and covered ground are merged into one same class(ground). The classification accuracy for ground is 94.60%, roof is 95.47%, vegetation is 85.55%, wall is 76.17%, other object is 20.39%.
Bayesian nonlinear structural FE model and seismic input identification for damage assessment of civil structures

NASA Astrophysics Data System (ADS)

Astroza, Rodrigo; Ebrahimian, Hamed; Li, Yong; Conte, Joel P.

2017-09-01

A methodology is proposed to update mechanics-based nonlinear finite element (FE) models of civil structures subjected to unknown input excitation. The approach allows to jointly estimate unknown time-invariant model parameters of a nonlinear FE model of the structure and the unknown time histories of input excitations using spatially-sparse output response measurements recorded during an earthquake event. The unscented Kalman filter, which circumvents the computation of FE response sensitivities with respect to the unknown model parameters and unknown input excitations by using a deterministic sampling approach, is employed as the estimation tool. The use of measurement data obtained from arrays of heterogeneous sensors, including accelerometers, displacement sensors, and strain gauges is investigated. Based on the estimated FE model parameters and input excitations, the updated nonlinear FE model can be interrogated to detect, localize, classify, and assess damage in the structure. Numerically simulated response data of a three-dimensional 4-story 2-by-1 bay steel frame structure with six unknown model parameters subjected to unknown bi-directional horizontal seismic excitation, and a three-dimensional 5-story 2-by-1 bay reinforced concrete frame structure with nine unknown model parameters subjected to unknown bi-directional horizontal seismic excitation are used to illustrate and validate the proposed methodology. The results of the validation studies show the excellent performance and robustness of the proposed algorithm to jointly estimate unknown FE model parameters and unknown input excitations.
Local sparse bump hunting reveals molecular heterogeneity of colon tumors‡

PubMed Central

Dazard, Jean-Eudes; Rao, J. Sunil; Markowitz, Sanford

2013-01-01

The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genome-wide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of tumors, potentially exhibiting the maximum amount of genetic heterogeneity. Casting a statistical net around such a complex problem proves difficult because of the high dimensionality and multi-collinearity of the gene expression space, combined with the fact that genes act in concert with one another and that not all genes surveyed might be involved. We devise a strategy to identify distinct subgroups of samples and determine the genetic/molecular signature that defines them. This involves use of the local sparse bump hunting algorithm, which provides a much more optimal and biologically faithful transformed space within which to search for bumps. In addition, thanks to the variable selection feature of the algorithm, we derived a novel sparse gene expression signature, which appears to divide all colon cancer patients into two populations: a population whose expression pattern can be molecularly encompassed within the bump and an outlier population that cannot be. Although all patients within any given stage of the disease, including the metastatic group, appear clinically homogeneous, our procedure revealed two subgroups in each stage with distinct genetic/molecular profiles. We also discuss implications of such a finding in terms of early detection, diagnosis and prognosis. PMID:22052459
Sparse modeling applied to patient identification for safety in medical physics applications

NASA Astrophysics Data System (ADS)

Lewkowitz, Stephanie

Every scheduled treatment at a radiation therapy clinic involves a series of safety protocol to ensure the utmost patient care. Despite safety protocol, on a rare occasion an entirely preventable medical event, an accident, may occur. Delivering a treatment plan to the wrong patient is preventable, yet still is a clinically documented error. This research describes a computational method to identify patients with a novel machine learning technique to combat misadministration. The patient identification program stores face and fingerprint data for each patient. New, unlabeled data from those patients are categorized according to the library. The categorization of data by this face-fingerprint detector is accomplished with new machine learning algorithms based on Sparse Modeling that have already begun transforming the foundation of Computer Vision. Previous patient recognition software required special subroutines for faces and different tailored subroutines for fingerprints. In this research, the same exact model is used for both fingerprints and faces, without any additional subroutines and even without adjusting the two hyperparameters. Sparse modeling is a powerful tool, already shown utility in the areas of super-resolution, denoising, inpainting, demosaicing, and sub-nyquist sampling, i.e. compressed sensing. Sparse Modeling is possible because natural images are inherently sparse in some bases, due to their inherent structure. This research chooses datasets of face and fingerprint images to test the patient identification model. The model stores the images of each dataset as a basis (library). One image at a time is removed from the library, and is classified by a sparse code in terms of the remaining library. The Locally Competitive Algorithm, a truly neural inspired Artificial Neural Network, solves the computationally difficult task of finding the sparse code for the test image. The components of the sparse representation vector are summed by ℓ1 pooling, and correct patient identification is consistently achieved 100% over 1000 trials, when either the face data or fingerprint data are implemented as a classification basis. The algorithm gets 100% classification when faces and fingerprints are concatenated into multimodal datasets. This suggests that 100% patient identification will be achievable in the clinal setting.
Sloped terrain segmentation for autonomous drive using sparse 3D point cloud.

PubMed

Cho, Seoungjae; Kim, Jonghyun; Ikram, Warda; Cho, Kyungeun; Jeong, Young-Sik; Um, Kyhyun; Sim, Sungdae

2014-01-01

A ubiquitous environment for road travel that uses wireless networks requires the minimization of data exchange between vehicles. An algorithm that can segment the ground in real time is necessary to obtain location data between vehicles simultaneously executing autonomous drive. This paper proposes a framework for segmenting the ground in real time using a sparse three-dimensional (3D) point cloud acquired from undulating terrain. A sparse 3D point cloud can be acquired by scanning the geography using light detection and ranging (LiDAR) sensors. For efficient ground segmentation, 3D point clouds are quantized in units of volume pixels (voxels) and overlapping data is eliminated. We reduce nonoverlapping voxels to two dimensions by implementing a lowermost heightmap. The ground area is determined on the basis of the number of voxels in each voxel group. We execute ground segmentation in real time by proposing an approach to minimize the comparison between neighboring voxels. Furthermore, we experimentally verify that ground segmentation can be executed at about 19.31 ms per frame.
Online Hierarchical Sparse Representation of Multifeature for Robust Object Tracking

PubMed Central

Qu, Shiru

2016-01-01

Object tracking based on sparse representation has given promising tracking results in recent years. However, the trackers under the framework of sparse representation always overemphasize the sparse representation and ignore the correlation of visual information. In addition, the sparse coding methods only encode the local region independently and ignore the spatial neighborhood information of the image. In this paper, we propose a robust tracking algorithm. Firstly, multiple complementary features are used to describe the object appearance; the appearance model of the tracked target is modeled by instantaneous and stable appearance features simultaneously. A two-stage sparse-coded method which takes the spatial neighborhood information of the image patch and the computation burden into consideration is used to compute the reconstructed object appearance. Then, the reliability of each tracker is measured by the tracking likelihood function of transient and reconstructed appearance models. Finally, the most reliable tracker is obtained by a well established particle filter framework; the training set and the template library are incrementally updated based on the current tracking results. Experiment results on different challenging video sequences show that the proposed algorithm performs well with superior tracking accuracy and robustness. PMID:27630710
Sparse estimation of model-based diffuse thermal dust emission

NASA Astrophysics Data System (ADS)

Irfan, Melis O.; Bobin, Jérôme

2018-03-01

Component separation for the Planck High Frequency Instrument (HFI) data is primarily concerned with the estimation of thermal dust emission, which requires the separation of thermal dust from the cosmic infrared background (CIB). For that purpose, current estimation methods rely on filtering techniques to decouple thermal dust emission from CIB anisotropies, which tend to yield a smooth, low-resolution, estimation of the dust emission. In this paper, we present a new parameter estimation method, premise: Parameter Recovery Exploiting Model Informed Sparse Estimates. This method exploits the sparse nature of thermal dust emission to calculate all-sky maps of thermal dust temperature, spectral index, and optical depth at 353 GHz. premise is evaluated and validated on full-sky simulated data. We find the percentage difference between the premise results and the true values to be 2.8, 5.7, and 7.2 per cent at the 1σ level across the full sky for thermal dust temperature, spectral index, and optical depth at 353 GHz, respectively. A comparison between premise and a GNILC-like method over selected regions of our sky simulation reveals that both methods perform comparably within high signal-to-noise regions. However, outside of the Galactic plane, premise is seen to outperform the GNILC-like method with increasing success as the signal-to-noise ratio worsens.
Three dimensional surface slip partitioning of the Sichuan earthquake from Synthetic Aperture Radar

NASA Astrophysics Data System (ADS)

de Michele, M.; Raucoules, D.; de Sigoyer, J.; Pubellier, M.; Lasserre, C.; Pathier, E.; Klinger, Y.; van der Woerd, J.

2009-12-01

The Sichuan earthquake, Mw 7.9, struck the Longmen Shan range front, in the western Sichuan province, China, on 12 May 2008. It severely affected an area where little historical seismicity and little or no significant active shortening were reported before the earthquake (e.g. Gu et al., 1989; Chen et al., 1994; Gan et al., 2007). The Longmen Shan thrust system bounds the eastern margin of the Tibetan plateau and is considered as a transpressive zone since Triassic time that was reactivated during the India-Asia collision (e.g., Tapponnier and Molnar, 1977, Chen and Wilson 1996; Arne et al., 1997, Godard et al., 2009). However, contrasting geological evidences of sparse thrusting and marked dextral strike-slip faulting during the Quaternary along with high topography (Burchfiel et al., 1995; Densmore et al., 2007) have led to models of dynamically driven and sustained topography (Royden et al., 1997) limiting the role of earthquakes in relief building and leaving the mechanism of long term strain distribution in this area as an open question. Here we combine C and L band Synthetic Aperture Radar (SAR) offsets data from ascending and descending paths to retrieve the three dimensional surface slips distribution all along the earthquake ruptures of the Sichuan earthquake. We show a quantitative assessment of the amount of co-seismic slip and its partitioning at the surface.
Regional model-based computerized ionospheric tomography using GPS measurements: IONOLAB-CIT

NASA Astrophysics Data System (ADS)

Tuna, Hakan; Arikan, Orhan; Arikan, Feza

2015-10-01

Three-dimensional imaging of the electron density distribution in the ionosphere is a crucial task for investigating the ionospheric effects. Dual-frequency Global Positioning System (GPS) satellite signals can be used to estimate the slant total electron content (STEC) along the propagation path between a GPS satellite and ground-based receiver station. However, the estimated GPS-STEC is very sparse and highly nonuniformly distributed for obtaining reliable 3-D electron density distributions derived from the measurements alone. Standard tomographic reconstruction techniques are not accurate or reliable enough to represent the full complexity of variable ionosphere. On the other hand, model-based electron density distributions are produced according to the general trends of ionosphere, and these distributions do not agree with measurements, especially for geomagnetically active hours. In this study, a regional 3-D electron density distribution reconstruction method, namely, IONOLAB-CIT, is proposed to assimilate GPS-STEC into physical ionospheric models. The proposed method is based on an iterative optimization framework that tracks the deviations from the ionospheric model in terms of F2 layer critical frequency and maximum ionization height resulting from the comparison of International Reference Ionosphere extended to Plasmasphere (IRI-Plas) model-generated STEC and GPS-STEC. The suggested tomography algorithm is applied successfully for the reconstruction of electron density profiles over Turkey, during quiet and disturbed hours of ionosphere using Turkish National Permanent GPS Network.
Interpretation of FTIR spectra of polymers and Raman spectra of car paints by means of likelihood ratio approach supported by wavelet transform for reducing data dimensionality.

PubMed

Martyna, Agnieszka; Michalska, Aleksandra; Zadora, Grzegorz

2015-05-01

The problem of interpretation of common provenance of the samples within the infrared spectra database of polypropylene samples from car body parts and plastic containers as well as Raman spectra databases of blue solid and metallic automotive paints was under investigation. The research involved statistical tools such as likelihood ratio (LR) approach for expressing the evidential value of observed similarities and differences in the recorded spectra. Since the LR models can be easily proposed for databases described by a few variables, research focused on the problem of spectra dimensionality reduction characterised by more than a thousand variables. The objective of the studies was to combine the chemometric tools easily dealing with multidimensionality with an LR approach. The final variables used for LR models' construction were derived from the discrete wavelet transform (DWT) as a data dimensionality reduction technique supported by methods for variance analysis and corresponded with chemical information, i.e. typical absorption bands for polypropylene and peaks associated with pigments present in the car paints. Univariate and multivariate LR models were proposed, aiming at obtaining more information about the chemical structure of the samples. Their performance was controlled by estimating the levels of false positive and false negative answers and using the empirical cross entropy approach. The results for most of the LR models were satisfactory and enabled solving the stated comparison problems. The results prove that the variables generated from DWT preserve signal characteristic, being a sparse representation of the original signal by keeping its shape and relevant chemical information.
CMOS Active-Pixel Image Sensor With Intensity-Driven Readout

NASA Technical Reports Server (NTRS)

Langenbacher, Harry T.; Fossum, Eric R.; Kemeny, Sabrina

1996-01-01

Proposed complementary metal oxide/semiconductor (CMOS) integrated-circuit image sensor automatically provides readouts from pixels in order of decreasing illumination intensity. Sensor operated in integration mode. Particularly useful in number of image-sensing tasks, including diffractive laser range-finding, three-dimensional imaging, event-driven readout of sparse sensor arrays, and star tracking.
Turbulent flows over sparse canopies

NASA Astrophysics Data System (ADS)

Sharma, Akshath; García-Mayoral, Ricardo

2018-04-01

Turbulent flows over sparse and dense canopies exerting a similar drag force on the flow are investigated using Direct Numerical Simulations. The dense canopies are modelled using a homogeneous drag force, while for the sparse canopy, the geometry of the canopy elements is represented. It is found that on using the friction velocity based on the local shear at each height, the streamwise velocity fluctuations and the Reynolds stress within the sparse canopy are similar to those from a comparable smooth-wall case. In addition, when scaled with the local friction velocity, the intensity of the off-wall peak in the streamwise vorticity for sparse canopies also recovers a value similar to a smooth-wall. This indicates that the sparse canopy does not significantly disturb the near-wall turbulence cycle, but causes its rescaling to an intensity consistent with a lower friction velocity within the canopy. In comparison, the dense canopy is found to have a higher damping effect on the turbulent fluctuations. For the case of the sparse canopy, a peak in the spectral energy density of the wall-normal velocity, and Reynolds stress is observed, which may indicate the formation of Kelvin-Helmholtz-like instabilities. It is also found that a sparse canopy is better modelled by a homogeneous drag applied on the mean flow alone, and not the turbulent fluctuations.
Sparse representation based image interpolation with nonlocal autoregressive modeling.

PubMed

Dong, Weisheng; Zhang, Lei; Lukac, Rastislav; Shi, Guangming

2013-04-01

Sparse representation is proven to be a promising approach to image super-resolution, where the low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR) counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models (SRM) become less effective, because the data fidelity term fails to constrain the image local structures. In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM.
Salient Object Detection via Structured Matrix Decomposition.

PubMed

Peng, Houwen; Li, Bing; Ling, Haibin; Hu, Weiming; Xiong, Weihua; Maybank, Stephen J

2016-05-04

Low-rank recovery models have shown potential for salient object detection, where a matrix is decomposed into a low-rank matrix representing image background and a sparse matrix identifying salient objects. Two deficiencies, however, still exist. First, previous work typically assumes the elements in the sparse matrix are mutually independent, ignoring the spatial and pattern relations of image regions. Second, when the low-rank and sparse matrices are relatively coherent, e.g., when there are similarities between the salient objects and background or when the background is complicated, it is difficult for previous models to disentangle them. To address these problems, we propose a novel structured matrix decomposition model with two structural regularizations: (1) a tree-structured sparsity-inducing regularization that captures the image structure and enforces patches from the same object to have similar saliency values, and (2) a Laplacian regularization that enlarges the gaps between salient objects and the background in feature space. Furthermore, high-level priors are integrated to guide the matrix decomposition and boost the detection. We evaluate our model for salient object detection on five challenging datasets including single object, multiple objects and complex scene images, and show competitive results as compared with 24 state-of-the-art methods in terms of seven performance metrics.
Two-dimensional shape recognition using sparse distributed memory

NASA Technical Reports Server (NTRS)

Kanerva, Pentti; Olshausen, Bruno

1990-01-01

Researchers propose a method for recognizing two-dimensional shapes (hand-drawn characters, for example) with an associative memory. The method consists of two stages: first, the image is preprocessed to extract tangents to the contour of the shape; second, the set of tangents is converted to a long bit string for recognition with sparse distributed memory (SDM). SDM provides a simple, massively parallel architecture for an associative memory. Long bit vectors (256 to 1000 bits, for example) serve as both data and addresses to the memory, and patterns are grouped or classified according to similarity in Hamming distance. At the moment, tangents are extracted in a simple manner by progressively blurring the image and then using a Canny-type edge detector (Canny, 1986) to find edges at each stage of blurring. This results in a grid of tangents. While the technique used for obtaining the tangents is at present rather ad hoc, researchers plan to adopt an existing framework for extracting edge orientation information over a variety of resolutions, such as suggested by Watson (1987, 1983), Marr and Hildreth (1980), or Canny (1986).

Efficient inference for genetic association studies with multiple outcomes.

PubMed

Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina

2017-10-01

Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

PubMed

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
Decoding memory features from hippocampal spiking activities using sparse classification models.

PubMed

Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W

2016-08-01

To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.
Using Tensor Completion Method to Achieving Better Coverage of Traffic State Estimation from Sparse Floating Car Data

PubMed Central

Ran, Bin; Song, Li; Cheng, Yang; Tan, Huachun

2016-01-01

Traffic state estimation from the floating car system is a challenging problem. The low penetration rate and random distribution make available floating car samples usually cover part space and time points of the road networks. To obtain a wide range of traffic state from the floating car system, many methods have been proposed to estimate the traffic state for the uncovered links. However, these methods cannot provide traffic state of the entire road networks. In this paper, the traffic state estimation is transformed to solve a missing data imputation problem, and the tensor completion framework is proposed to estimate missing traffic state. A tensor is constructed to model traffic state in which observed entries are directly derived from floating car system and unobserved traffic states are modeled as missing entries of constructed tensor. The constructed traffic state tensor can represent spatial and temporal correlations of traffic data and encode the multi-way properties of traffic state. The advantage of the proposed approach is that it can fully mine and utilize the multi-dimensional inherent correlations of traffic state. We tested the proposed approach on a well calibrated simulation network. Experimental results demonstrated that the proposed approach yield reliable traffic state estimation from very sparse floating car data, particularly when dealing with the floating car penetration rate is below 1%. PMID:27448326
Using Tensor Completion Method to Achieving Better Coverage of Traffic State Estimation from Sparse Floating Car Data.

PubMed

Ran, Bin; Song, Li; Zhang, Jian; Cheng, Yang; Tan, Huachun

2016-01-01

Traffic state estimation from the floating car system is a challenging problem. The low penetration rate and random distribution make available floating car samples usually cover part space and time points of the road networks. To obtain a wide range of traffic state from the floating car system, many methods have been proposed to estimate the traffic state for the uncovered links. However, these methods cannot provide traffic state of the entire road networks. In this paper, the traffic state estimation is transformed to solve a missing data imputation problem, and the tensor completion framework is proposed to estimate missing traffic state. A tensor is constructed to model traffic state in which observed entries are directly derived from floating car system and unobserved traffic states are modeled as missing entries of constructed tensor. The constructed traffic state tensor can represent spatial and temporal correlations of traffic data and encode the multi-way properties of traffic state. The advantage of the proposed approach is that it can fully mine and utilize the multi-dimensional inherent correlations of traffic state. We tested the proposed approach on a well calibrated simulation network. Experimental results demonstrated that the proposed approach yield reliable traffic state estimation from very sparse floating car data, particularly when dealing with the floating car penetration rate is below 1%.
Symmetrized density matrix renormalization group algorithm for low-lying excited states of conjugated carbon systems: Application to 1,12-benzoperylene and polychrysene

NASA Astrophysics Data System (ADS)

Prodhan, Suryoday; Ramasesha, S.

2018-05-01

The symmetry adapted density matrix renormalization group (SDMRG) technique has been an efficient method for studying low-lying eigenstates in one- and quasi-one-dimensional electronic systems. However, the SDMRG method had bottlenecks involving the construction of linearly independent symmetry adapted basis states as the symmetry matrices in the DMRG basis were not sparse. We have developed a modified algorithm to overcome this bottleneck. The new method incorporates end-to-end interchange symmetry (C2) , electron-hole symmetry (J ) , and parity or spin-flip symmetry (P ) in these calculations. The one-to-one correspondence between direct-product basis states in the DMRG Hilbert space for these symmetry operations renders the symmetry matrices in the new basis with maximum sparseness, just one nonzero matrix element per row. Using methods similar to those employed in the exact diagonalization technique for Pariser-Parr-Pople (PPP) models, developed in the 1980s, it is possible to construct orthogonal SDMRG basis states while bypassing the slow step of the Gram-Schmidt orthonormalization procedure. The method together with the PPP model which incorporates long-range electronic correlations is employed to study the correlated excited-state spectra of 1,12-benzoperylene and a narrow mixed graphene nanoribbon with a chrysene molecule as the building unit, comprising both zigzag and cove-edge structures.
A deconvolution extraction method for 2D multi-object fibre spectroscopy based on the regularized least-squares QR-factorization algorithm

NASA Astrophysics Data System (ADS)

Yu, Jian; Yin, Qian; Guo, Ping; Luo, A.-li

2014-09-01

This paper presents an efficient method for the extraction of astronomical spectra from two-dimensional (2D) multifibre spectrographs based on the regularized least-squares QR-factorization (LSQR) algorithm. We address two issues: we propose a modified Gaussian point spread function (PSF) for modelling the 2D PSF from multi-emission-line gas-discharge lamp images (arc images), and we develop an efficient deconvolution method to extract spectra in real circumstances. The proposed modified 2D Gaussian PSF model can fit various types of 2D PSFs, including different radial distortion angles and ellipticities. We adopt the regularized LSQR algorithm to solve the sparse linear equations constructed from the sparse convolution matrix, which we designate the deconvolution spectrum extraction method. Furthermore, we implement a parallelized LSQR algorithm based on graphics processing unit programming in the Compute Unified Device Architecture to accelerate the computational processing. Experimental results illustrate that the proposed extraction method can greatly reduce the computational cost and memory use of the deconvolution method and, consequently, increase its efficiency and practicability. In addition, the proposed extraction method has a stronger noise tolerance than other methods, such as the boxcar (aperture) extraction and profile extraction methods. Finally, we present an analysis of the sensitivity of the extraction results to the radius and full width at half-maximum of the 2D PSF.
A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

PubMed

Zhang, L; Liu, X J

2016-06-03

With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood

PubMed Central

Yan, Fang-Rong; Lin, Jin-Guan; Liu, Yu

2011-01-01

The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD) penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA) and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO) penalty, by using receiver operating characteristic (ROC) with bayesian bootstrap estimating area under the curve (AUC) diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA) in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis. PMID:21716672
Microstructure Images Restoration of Metallic Materials Based upon KSVD and Smoothing Penalty Sparse Representation Approach

PubMed Central

Liang, Steven Y.

2018-01-01

Microstructure images of metallic materials play a significant role in industrial applications. To address image degradation problem of metallic materials, a novel image restoration technique based on K-means singular value decomposition (KSVD) and smoothing penalty sparse representation (SPSR) algorithm is proposed in this work, the microstructure images of aluminum alloy 7075 (AA7075) material are used as examples. To begin with, to reflect the detail structure characteristics of the damaged image, the KSVD dictionary is introduced to substitute the traditional sparse transform basis (TSTB) for sparse representation. Then, due to the image restoration, modeling belongs to a highly underdetermined equation, and traditional sparse reconstruction methods may cause instability and obvious artifacts in the reconstructed images, especially reconstructed image with many smooth regions and the noise level is strong, thus the SPSR (here, q = 0.5) algorithm is designed to reconstruct the damaged image. The results of simulation and two practical cases demonstrate that the proposed method has superior performance compared with some state-of-the-art methods in terms of restoration performance factors and visual quality. Meanwhile, the grain size parameters and grain boundaries of microstructure image are discussed before and after they are restored by proposed method. PMID:29677163
Modeling of Microplasmas with Nano-Engineered Electrodes

NASA Astrophysics Data System (ADS)

Macheret, Sergey; Tholeti, Siva Shashank; Alexeenko, Alina

2015-09-01

Microplasmas can potentially be used as unique tunable dielectrics for reconfigurable radio-frequency systems, if electron densities of 1010-1012 cm-3 can be sustained in cavities smaller than 100 micron. However, for low loss tangent, gas pressures below 10 mTorr would be required, whereas the physics of electron impact ionization dictates the pd scaling so that microplasmas must operate at high gas pressures, hundreds of Torr, and also high voltages. We analyze a new principle of plasma generation that goes well beyond the pd scaling by eliminating electron impact ionization. In the new concept, electrons are generated at the cathode by field emission from nanotubes, and ions are independently produced in field ionization at atomically-sharp tips on the anode. The electrons and ions then move in the opposite directions, mix, and create a plasma. The low pressure results in collisionless motion with no electron-impact ionization. One-dimensional PIC/MCC calculations show that emitters such as carbon nanotubes placed sparsely on the cathode, combined with field ionization nanorods at the anode, can indeed ensure steady-state electron densities of up to 1012 cm-3 at gas pressure lower than 10 mTorr with only 50-100 Volts applied cross a 40-50 μm gap.
Period Estimation for Sparsely-sampled Quasi-periodic Light Curves Applied to Miras

NASA Astrophysics Data System (ADS)

He, Shiyuan; Yuan, Wenlong; Huang, Jianhua Z.; Long, James; Macri, Lucas M.

2016-12-01

We develop a nonlinear semi-parametric Gaussian process model to estimate periods of Miras with sparsely sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequency parameter space over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting period-luminosity relations.
Surrogate modelling for the prediction of spatial fields based on simultaneous dimensionality reduction of high-dimensional input/output spaces.

PubMed

Crevillén-García, D

2018-04-01

Time-consuming numerical simulators for solving groundwater flow and dissolution models of physico-chemical processes in deep aquifers normally require some of the model inputs to be defined in high-dimensional spaces in order to return realistic results. Sometimes, the outputs of interest are spatial fields leading to high-dimensional output spaces. Although Gaussian process emulation has been satisfactorily used for computing faithful and inexpensive approximations of complex simulators, these have been mostly applied to problems defined in low-dimensional input spaces. In this paper, we propose a method for simultaneously reducing the dimensionality of very high-dimensional input and output spaces in Gaussian process emulators for stochastic partial differential equation models while retaining the qualitative features of the original models. This allows us to build a surrogate model for the prediction of spatial fields in such time-consuming simulators. We apply the methodology to a model of convection and dissolution processes occurring during carbon capture and storage.
Adaptive kernel regression for freehand 3D ultrasound reconstruction

NASA Astrophysics Data System (ADS)

Alshalalfah, Abdel-Latif; Daoud, Mohammad I.; Al-Najar, Mahasen

2017-03-01

Freehand three-dimensional (3D) ultrasound imaging enables low-cost and flexible 3D scanning of arbitrary-shaped organs, where the operator can freely move a two-dimensional (2D) ultrasound probe to acquire a sequence of tracked cross-sectional images of the anatomy. Often, the acquired 2D ultrasound images are irregularly and sparsely distributed in the 3D space. Several 3D reconstruction algorithms have been proposed to synthesize 3D ultrasound volumes based on the acquired 2D images. A challenging task during the reconstruction process is to preserve the texture patterns in the synthesized volume and ensure that all gaps in the volume are correctly filled. This paper presents an adaptive kernel regression algorithm that can effectively reconstruct high-quality freehand 3D ultrasound volumes. The algorithm employs a kernel regression model that enables nonparametric interpolation of the voxel gray-level values. The kernel size of the regression model is adaptively adjusted based on the characteristics of the voxel that is being interpolated. In particular, when the algorithm is employed to interpolate a voxel located in a region with dense ultrasound data samples, the size of the kernel is reduced to preserve the texture patterns. On the other hand, the size of the kernel is increased in areas that include large gaps to enable effective gap filling. The performance of the proposed algorithm was compared with seven previous interpolation approaches by synthesizing freehand 3D ultrasound volumes of a benign breast tumor. The experimental results show that the proposed algorithm outperforms the other interpolation approaches.
The storage capacity of Potts models for semantic memory retrieval

NASA Astrophysics Data System (ADS)

Kropff, Emilio; Treves, Alessandro

2005-08-01

We introduce and analyse a minimal network model of semantic memory in the human brain. The model is a global associative memory structured as a collection of N local modules, each coding a feature, which can take S possible values, with a global sparseness a (the average fraction of features describing a concept). We show that, under optimal conditions, the number cM of modules connected on average to a module can range widely between very sparse connectivity (high dilution, c_{M}/N\\to 0 ) and full connectivity (c_{M}\\to N ), maintaining a global network storage capacity (the maximum number pc of stored and retrievable concepts) that scales like pc~cMS2/a, with logarithmic corrections consistent with the constraint that each synapse may store up to a fraction of a bit.
Exploiting sparsity and low-rank structure for the recovery of multi-slice breast MRIs with reduced sampling error.

PubMed

Yin, X X; Ng, B W-H; Ramamohanarao, K; Baghai-Wadji, A; Abbott, D

2012-09-01

It has been shown that, magnetic resonance images (MRIs) with sparsity representation in a transformed domain, e.g. spatial finite-differences (FD), or discrete cosine transform (DCT), can be restored from undersampled k-space via applying current compressive sampling theory. The paper presents a model-based method for the restoration of MRIs. The reduced-order model, in which a full-system-response is projected onto a subspace of lower dimensionality, has been used to accelerate image reconstruction by reducing the size of the involved linear system. In this paper, the singular value threshold (SVT) technique is applied as a denoising scheme to reduce and select the model order of the inverse Fourier transform image, and to restore multi-slice breast MRIs that have been compressively sampled in k-space. The restored MRIs with SVT for denoising show reduced sampling errors compared to the direct MRI restoration methods via spatial FD, or DCT. Compressive sampling is a technique for finding sparse solutions to underdetermined linear systems. The sparsity that is implicit in MRIs is to explore the solution to MRI reconstruction after transformation from significantly undersampled k-space. The challenge, however, is that, since some incoherent artifacts result from the random undersampling, noise-like interference is added to the image with sparse representation. These recovery algorithms in the literature are not capable of fully removing the artifacts. It is necessary to introduce a denoising procedure to improve the quality of image recovery. This paper applies a singular value threshold algorithm to reduce the model order of image basis functions, which allows further improvement of the quality of image reconstruction with removal of noise artifacts. The principle of the denoising scheme is to reconstruct the sparse MRI matrices optimally with a lower rank via selecting smaller number of dominant singular values. The singular value threshold algorithm is performed by minimizing the nuclear norm of difference between the sampled image and the recovered image. It has been illustrated that this algorithm improves the ability of previous image reconstruction algorithms to remove noise artifacts while significantly improving the quality of MRI recovery.
Classification of motor imagery tasks for BCI with multiresolution analysis and multiobjective feature selection.

PubMed

Ortega, Julio; Asensio-Cubero, Javier; Gan, John Q; Ortiz, Andrés

2016-07-15

Brain-computer interfacing (BCI) applications based on the classification of electroencephalographic (EEG) signals require solving high-dimensional pattern classification problems with such a relatively small number of training patterns that curse of dimensionality problems usually arise. Multiresolution analysis (MRA) has useful properties for signal analysis in both temporal and spectral analysis, and has been broadly used in the BCI field. However, MRA usually increases the dimensionality of the input data. Therefore, some approaches to feature selection or feature dimensionality reduction should be considered for improving the performance of the MRA based BCI. This paper investigates feature selection in the MRA-based frameworks for BCI. Several wrapper approaches to evolutionary multiobjective feature selection are proposed with different structures of classifiers. They are evaluated by comparing with baseline methods using sparse representation of features or without feature selection. The statistical analysis, by applying the Kolmogorov-Smirnoff and Kruskal-Wallis tests to the means of the Kappa values evaluated by using the test patterns in each approach, has demonstrated some advantages of the proposed approaches. In comparison with the baseline MRA approach used in previous studies, the proposed evolutionary multiobjective feature selection approaches provide similar or even better classification performances, with significant reduction in the number of features that need to be computed.
Automated bone segmentation from dental CBCT images using patch-based sparse representation and convex optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Li; Gao, Yaozong; Shi, Feng

Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate three-dimensional (3D) models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the poor image quality, including very low signal-to-noise ratio and the widespread image artifacts such as noise, beam hardening, and inhomogeneity, it is challenging to segment the CBCT images. In this paper, the authors present a new automatic segmentation method to address these problems. Methods: To segmentmore » CBCT images, the authors propose a new method for fully automated CBCT segmentation by using patch-based sparse representation to (1) segment bony structures from the soft tissues and (2) further separate the mandible from the maxilla. Specifically, a region-specific registration strategy is first proposed to warp all the atlases to the current testing subject and then a sparse-based label propagation strategy is employed to estimate a patient-specific atlas from all aligned atlases. Finally, the patient-specific atlas is integrated into amaximum a posteriori probability-based convex segmentation framework for accurate segmentation. Results: The proposed method has been evaluated on a dataset with 15 CBCT images. The effectiveness of the proposed region-specific registration strategy and patient-specific atlas has been validated by comparing with the traditional registration strategy and population-based atlas. The experimental results show that the proposed method achieves the best segmentation accuracy by comparison with other state-of-the-art segmentation methods. Conclusions: The authors have proposed a new CBCT segmentation method by using patch-based sparse representation and convex optimization, which can achieve considerably accurate segmentation results in CBCT segmentation based on 15 patients.« less
Direct measurement of the 3-dimensional DNA lesion distribution induced by energetic charged particles in a mouse model tissue

PubMed Central

Mirsch, Johanna; Tommasino, Francesco; Frohns, Antonia; Conrad, Sandro; Durante, Marco; Scholz, Michael; Friedrich, Thomas; Löbrich, Markus

2015-01-01

Charged particles are increasingly used in cancer radiotherapy and contribute significantly to the natural radiation risk. The difference in the biological effects of high-energy charged particles compared with X-rays or γ-rays is determined largely by the spatial distribution of their energy deposition events. Part of the energy is deposited in a densely ionizing manner in the inner part of the track, with the remainder spread out more sparsely over the outer track region. Our knowledge about the dose distribution is derived solely from modeling approaches and physical measurements in inorganic material. Here we exploited the exceptional sensitivity of γH2AX foci technology and quantified the spatial distribution of DNA lesions induced by charged particles in a mouse model tissue. We observed that charged particles damage tissue nonhomogenously, with single cells receiving high doses and many other cells exposed to isolated damage resulting from high-energy secondary electrons. Using calibration experiments, we transformed the 3D lesion distribution into a dose distribution and compared it with predictions from modeling approaches. We obtained a radial dose distribution with sub-micrometer resolution that decreased with increasing distance to the particle path following a 1/r2 dependency. The analysis further revealed the existence of a background dose at larger distances from the particle path arising from overlapping dose deposition events from independent particles. Our study provides, to our knowledge, the first quantification of the spatial dose distribution of charged particles in biologically relevant material, and will serve as a benchmark for biophysical models that predict the biological effects of these particles. PMID:26392532
Systematic parameter inference in stochastic mesoscopic modeling

NASA Astrophysics Data System (ADS)

Lei, Huan; Yang, Xiu; Li, Zhen; Karniadakis, George Em

2017-02-01

We propose a method to efficiently determine the optimal coarse-grained force field in mesoscopic stochastic simulations of Newtonian fluid and polymer melt systems modeled by dissipative particle dynamics (DPD) and energy conserving dissipative particle dynamics (eDPD). The response surfaces of various target properties (viscosity, diffusivity, pressure, etc.) with respect to model parameters are constructed based on the generalized polynomial chaos (gPC) expansion using simulation results on sampling points (e.g., individual parameter sets). To alleviate the computational cost to evaluate the target properties, we employ the compressive sensing method to compute the coefficients of the dominant gPC terms given the prior knowledge that the coefficients are "sparse". The proposed method shows comparable accuracy with the standard probabilistic collocation method (PCM) while it imposes a much weaker restriction on the number of the simulation samples especially for systems with high dimensional parametric space. Fully access to the response surfaces within the confidence range enables us to infer the optimal force parameters given the desirable values of target properties at the macroscopic scale. Moreover, it enables us to investigate the intrinsic relationship between the model parameters, identify possible degeneracies in the parameter space, and optimize the model by eliminating model redundancies. The proposed method provides an efficient alternative approach for constructing mesoscopic models by inferring model parameters to recover target properties of the physics systems (e.g., from experimental measurements), where those force field parameters and formulation cannot be derived from the microscopic level in a straight forward way.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.