Sample records for high-dimensional clustering problems

  1. High-dimensional cluster analysis with the Masked EM Algorithm

    PubMed Central

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  2. Clustering high dimensional data using RIA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aziz, Nazrina

    2015-05-15

    Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less

  3. Machine-learned cluster identification in high-dimensional data.

    PubMed

    Ultsch, Alfred; Lötsch, Jörn

    2017-02-01

    High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a

  4. Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres.

    PubMed

    Banerjee, Arindam; Ghosh, Joydeep

    2004-05-01

    Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.

  5. Model-based Clustering of High-Dimensional Data in Astrophysics

    NASA Astrophysics Data System (ADS)

    Bouveyron, C.

    2016-05-01

    The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.

  6. Mining High-Dimensional Data

    NASA Astrophysics Data System (ADS)

    Wang, Wei; Yang, Jiong

    With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.

  7. A Dissimilarity Measure for Clustering High- and Infinite Dimensional Data that Satisfies the Triangle Inequality

    NASA Technical Reports Server (NTRS)

    Socolovsky, Eduardo A.; Bushnell, Dennis M. (Technical Monitor)

    2002-01-01

    The cosine or correlation measures of similarity used to cluster high dimensional data are interpreted as projections, and the orthogonal components are used to define a complementary dissimilarity measure to form a similarity-dissimilarity measure pair. Using a geometrical approach, a number of properties of this pair is established. This approach is also extended to general inner-product spaces of any dimension. These properties include the triangle inequality for the defined dissimilarity measure, error estimates for the triangle inequality and bounds on both measures that can be obtained with a few floating-point operations from previously computed values of the measures. The bounds and error estimates for the similarity and dissimilarity measures can be used to reduce the computational complexity of clustering algorithms and enhance their scalability, and the triangle inequality allows the design of clustering algorithms for high dimensional distributed data.

  8. Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets

    NASA Astrophysics Data System (ADS)

    Tonkova, V.; Paulus, D.; Neeb, H.

    2013-02-01

    We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.

  9. Enabling the Discovery of Recurring Anomalies in Aerospace System Problem Reports using High-Dimensional Clustering Techniques

    NASA Technical Reports Server (NTRS)

    Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi

    2006-01-01

    This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-

  10. Classification of holter registers by dynamic clustering using multi-dimensional particle swarm optimization.

    PubMed

    Kiranyaz, Serkan; Ince, Turker; Pulkkinen, Jenni; Gabbouj, Moncef

    2010-01-01

    In this paper, we address dynamic clustering in high dimensional data or feature spaces as an optimization problem where multi-dimensional particle swarm optimization (MD PSO) is used to find out the true number of clusters, while fractional global best formation (FGBF) is applied to avoid local optima. Based on these techniques we then present a novel and personalized long-term ECG classification system, which addresses the problem of labeling the beats within a long-term ECG signal, known as Holter register, recorded from an individual patient. Due to the massive amount of ECG beats in a Holter register, visual inspection is quite difficult and cumbersome, if not impossible. Therefore the proposed system helps professionals to quickly and accurately diagnose any latent heart disease by examining only the representative beats (the so called master key-beats) each of which is representing a cluster of homogeneous (similar) beats. We tested the system on a benchmark database where the beats of each Holter register have been manually labeled by cardiologists. The selection of the right master key-beats is the key factor for achieving a highly accurate classification and the proposed systematic approach produced results that were consistent with the manual labels with 99.5% average accuracy, which basically shows the efficiency of the system.

  11. Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

    PubMed

    Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M

    2013-01-01

    This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.

  12. Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

    PubMed

    Mwangi, Benson; Soares, Jair C; Hasan, Khader M

    2014-10-30

    Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Two-and three-dimensional unsteady lift problems in high-speed flight

    NASA Technical Reports Server (NTRS)

    Lomax, Harvard; Heaslet, Max A; Fuller, Franklyn B; Sluder, Loma

    1952-01-01

    The problem of transient lift on two- and three-dimensional wings flying at high speeds is discussed as a boundary-value problem for the classical wave equation. Kirchoff's formula is applied so that the analysis is reduced, just as in the steady state, to an investigation of sources and doublets. The applications include the evaluation of indicial lift and pitching-moment curves for two-dimensional sinking and pitching wings flying at Mach numbers equal to 0, 0.8, 1.0, 1.2 and 2.0. Results for the sinking case are also given for a Mach number of 0.5. In addition, the indicial functions for supersonic-edged triangular wings in both forward and reverse flow are presented and compared with the two-dimensional values.

  14. Efficient implementation of parallel three-dimensional FFT on clusters of PCs

    NASA Astrophysics Data System (ADS)

    Takahashi, Daisuke

    2003-05-01

    In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.

  15. Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study.

    PubMed

    Städler, Nicolas; Dondelinger, Frank; Hill, Steven M; Akbani, Rehan; Lu, Yiling; Mills, Gordon B; Mukherjee, Sach

    2017-09-15

    Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. As the Bioconductor package nethet. staedler.n@gmail.com or sach.mukherjee@dzne.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  16. A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

    PubMed

    Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

    2015-10-01

    Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.

  17. Finite-volume application of high order ENO schemes to multi-dimensional boundary-value problems

    NASA Technical Reports Server (NTRS)

    Casper, Jay; Dorrepaal, J. Mark

    1990-01-01

    The finite volume approach in developing multi-dimensional, high-order accurate essentially non-oscillatory (ENO) schemes is considered. In particular, a two dimensional extension is proposed for the Euler equation of gas dynamics. This requires a spatial reconstruction operator that attains formal high order of accuracy in two dimensions by taking account of cross gradients. Given a set of cell averages in two spatial variables, polynomial interpolation of a two dimensional primitive function is employed in order to extract high-order pointwise values on cell interfaces. These points are appropriately chosen so that correspondingly high-order flux integrals are obtained through each interface by quadrature, at each point having calculated a flux contribution in an upwind fashion. The solution-in-the-small of Riemann's initial value problem (IVP) that is required for this pointwise flux computation is achieved using Roe's approximate Riemann solver. Issues to be considered in this two dimensional extension include the implementation of boundary conditions and application to general curvilinear coordinates. Results of numerical experiments are presented for qualitative and quantitative examination. These results contain the first successful application of ENO schemes to boundary value problems with solid walls.

  18. A Spatial Division Clustering Method and Low Dimensional Feature Extraction Technique Based Indoor Positioning System

    PubMed Central

    Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao

    2014-01-01

    Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470

  19. Multimodal, high-dimensional, model-based, Bayesian inverse problems with applications in biomechanics

    NASA Astrophysics Data System (ADS)

    Franck, I. M.; Koutsourelakis, P. S.

    2017-01-01

    This paper is concerned with the numerical solution of model-based, Bayesian inverse problems. We are particularly interested in cases where the cost of each likelihood evaluation (forward-model call) is expensive and the number of unknown (latent) variables is high. This is the setting in many problems in computational physics where forward models with nonlinear PDEs are used and the parameters to be calibrated involve spatio-temporarily varying coefficients, which upon discretization give rise to a high-dimensional vector of unknowns. One of the consequences of the well-documented ill-posedness of inverse problems is the possibility of multiple solutions. While such information is contained in the posterior density in Bayesian formulations, the discovery of a single mode, let alone multiple, poses a formidable computational task. The goal of the present paper is two-fold. On one hand, we propose approximate, adaptive inference strategies using mixture densities to capture multi-modal posteriors. On the other, we extend our work in [1] with regard to effective dimensionality reduction techniques that reveal low-dimensional subspaces where the posterior variance is mostly concentrated. We validate the proposed model by employing Importance Sampling which confirms that the bias introduced is small and can be efficiently corrected if the analyst wishes to do so. We demonstrate the performance of the proposed strategy in nonlinear elastography where the identification of the mechanical properties of biological materials can inform non-invasive, medical diagnosis. The discovery of multiple modes (solutions) in such problems is critical in achieving the diagnostic objectives.

  20. Fast Multipole Methods for Three-Dimensional N-body Problems

    NASA Technical Reports Server (NTRS)

    Koumoutsakos, P.

    1995-01-01

    We are developing computational tools for the simulations of three-dimensional flows past bodies undergoing arbitrary motions. High resolution viscous vortex methods have been developed that allow for extended simulations of two-dimensional configurations such as vortex generators. Our objective is to extend this methodology to three dimensions and develop a robust computational scheme for the simulation of such flows. A fundamental issue in the use of vortex methods is the ability of employing efficiently large numbers of computational elements to resolve the large range of scales that exist in complex flows. The traditional cost of the method scales as Omicron (N(sup 2)) as the N computational elements/particles induce velocities at each other, making the method unacceptable for simulations involving more than a few tens of thousands of particles. In the last decade fast methods have been developed that have operation counts of Omicron (N log N) or Omicron (N) (referred to as BH and GR respectively) depending on the details of the algorithm. These methods are based on the observation that the effect of a cluster of particles at a certain distance may be approximated by a finite series expansion. In order to exploit this observation we need to decompose the element population spatially into clusters of particles and build a hierarchy of clusters (a tree data structure) - smaller neighboring clusters combine to form a cluster of the next size up in the hierarchy and so on. This hierarchy of clusters allows one to determine efficiently when the approximation is valid. This algorithm is an N-body solver that appears in many fields of engineering and science. Some examples of its diverse use are in astrophysics, molecular dynamics, micro-magnetics, boundary element simulations of electromagnetic problems, and computer animation. More recently these N-body solvers have been implemented and applied in simulations involving vortex methods. Koumoutsakos and Leonard (1995

  1. A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla

    2014-03-01

    Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less

  2. Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data.

    PubMed

    McParland, D; Phillips, C M; Brennan, L; Roche, H M; Gormley, I C

    2017-12-10

    The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  3. Linear solver performance in elastoplastic problem solution on GPU cluster

    NASA Astrophysics Data System (ADS)

    Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.

    2017-12-01

    Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.

  4. Harnessing Sparse and Low-Dimensional Structures for Robust Clustering of Imagery Data

    ERIC Educational Resources Information Center

    Rao, Shankar Ramamohan

    2009-01-01

    We propose a robust framework for clustering data. In practice, data obtained from real measurement devices can be incomplete, corrupted by gross errors, or not correspond to any assumed model. We show that, by properly harnessing the intrinsic low-dimensional structure of the data, these kinds of practical problems can be dealt with in a uniform…

  5. Exponents of non-linear clustering in scale-free one-dimensional cosmological simulations

    NASA Astrophysics Data System (ADS)

    Benhaiem, David; Joyce, Michael; Sicard, François

    2013-03-01

    One-dimensional versions of dissipationless cosmological N-body simulations have been shown to share many qualitative behaviours of the three-dimensional problem. Their interest lies in the fact that they can resolve a much greater range of time and length scales, and admit exact numerical integration. We use such models here to study how non-linear clustering depends on initial conditions and cosmology. More specifically, we consider a family of models which, like the three-dimensional Einstein-de Sitter (EdS) model, lead for power-law initial conditions to self-similar clustering characterized in the strongly non-linear regime by power-law behaviour of the two-point correlation function. We study how the corresponding exponent γ depends on the initial conditions, characterized by the exponent n of the power spectrum of initial fluctuations, and on a single parameter κ controlling the rate of expansion. The space of initial conditions/cosmology divides very clearly into two parts: (1) a region in which γ depends strongly on both n and κ and where it agrees very well with a simple generalization of the so-called stable clustering hypothesis in three dimensions; and (2) a region in which γ is more or less independent of both the spectrum and the expansion of the universe. The boundary in (n, κ) space dividing the `stable clustering' region from the `universal' region is very well approximated by a `critical' value of the predicted stable clustering exponent itself. We explain how this division of the (n, κ) space can be understood as a simple physical criterion which might indeed be expected to control the validity of the stable clustering hypothesis. We compare and contrast our findings to results in three dimensions, and discuss in particular the light they may throw on the question of `universality' of non-linear clustering in this context.

  6. Challenge Online Time Series Clustering For Demand Response A Theory to Break the ‘Curse of Dimensionality'

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pal, Ranjan; Chelmis, Charalampos; Aman, Saima

    The advent of smart meters and advanced communication infrastructures catalyzes numerous smart grid applications such as dynamic demand response, and paves the way to solve challenging research problems in sustainable energy consumption. The space of solution possibilities are restricted primarily by the huge amount of generated data requiring considerable computational resources and efficient algorithms. To overcome this Big Data challenge, data clustering techniques have been proposed. Current approaches however do not scale in the face of the “increasing dimensionality” problem where a cluster point is represented by the entire customer consumption time series. To overcome this aspect we first rethinkmore » the way cluster points are created and designed, and then design an efficient online clustering technique for demand response (DR) in order to analyze high volume, high dimensional energy consumption time series data at scale, and on the fly. Our online algorithm is randomized in nature, and provides optimal performance guarantees in a computationally efficient manner. Unlike prior work we (i) study the consumption properties of the whole population simultaneously rather than developing individual models for each customer separately, claiming it to be a ‘killer’ approach that breaks the “curse of dimensionality” in online time series clustering, and (ii) provide tight performance guarantees in theory to validate our approach. Our insights are driven by the field of sociology, where collective behavior often emerges as the result of individual patterns and lifestyles.« less

  7. Distributed Computation of the knn Graph for Large High-Dimensional Point Sets

    PubMed Central

    Plaku, Erion; Kavraki, Lydia E.

    2009-01-01

    High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318

  8. Hyper-spectral image segmentation using spectral clustering with covariance descriptors

    NASA Astrophysics Data System (ADS)

    Kursun, Olcay; Karabiber, Fethullah; Koc, Cemalettin; Bal, Abdullah

    2009-02-01

    Image segmentation is an important and difficult computer vision problem. Hyper-spectral images pose even more difficulty due to their high-dimensionality. Spectral clustering (SC) is a recently popular clustering/segmentation algorithm. In general, SC lifts the data to a high dimensional space, also known as the kernel trick, then derive eigenvectors in this new space, and finally using these new dimensions partition the data into clusters. We demonstrate that SC works efficiently when combined with covariance descriptors that can be used to assess pixelwise similarities rather than in the high-dimensional Euclidean space. We present the formulations and some preliminary results of the proposed hybrid image segmentation method for hyper-spectral images.

  9. Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data

    DTIC Science & Technology

    2016-02-01

    Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data Ekaterina Merkurjev*, Andrea...bertozzi@math.ucla.edu, xiaoran@isi.edu, lerman@isi.edu. Abstract Recent advances in clustering have included continuous relaxations of the Cheeger cut ...fully nonlinear Cheeger cut problem, as well as the ratio cut optimization task. Both problems are connected to total variation minimization, and the

  10. Two-dimensional and three-dimensional Coulomb clusters in parabolic traps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'yachkov, L. G., E-mail: dyachk@mail.ru; Myasnikov, M. I., E-mail: miasnikovmi@mail.ru; Petrov, O. F.

    2014-09-15

    We consider the shell structure of Coulomb clusters in an axially symmetric parabolic trap exhibiting a confining potential U{sub c}(ρ,z)=(mω{sup 2}/2)(ρ{sup 2}+αz{sup 2}). Assuming an anisotropic parameter α = 4 (corresponding to experiments employing a cusp magnetic trap under microgravity conditions), we have calculated cluster configurations for particle numbers N = 3 to 30. We have shown that clusters with N ≤ 12 initially remain flat, transitioning to three-dimensional configurations as N increases. For N = 8, we have calculated the configurations of minimal potential energy for all values of α and found the points of configuration transitions. For N = 13 and 23, we discuss the influence of bothmore » the shielding and anisotropic parameter on potential energy, cluster size, and shell structure.« less

  11. A local search for a graph clustering problem

    NASA Astrophysics Data System (ADS)

    Navrotskaya, Anna; Il'ev, Victor

    2016-10-01

    In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.

  12. Understanding 3D human torso shape via manifold clustering

    NASA Astrophysics Data System (ADS)

    Li, Sheng; Li, Peng; Fu, Yun

    2013-05-01

    Discovering the variations in human torso shape plays a key role in many design-oriented applications, such as suit designing. With recent advances in 3D surface imaging technologies, people can obtain 3D human torso data that provide more information than traditional measurements. However, how to find different human shapes from 3D torso data is still an open problem. In this paper, we propose to use spectral clustering approach on torso manifold to address this problem. We first represent high-dimensional torso data in a low-dimensional space using manifold learning algorithm. Then the spectral clustering method is performed to get several disjoint clusters. Experimental results show that the clusters discovered by our approach can describe the discrepancies in both genders and human shapes, and our approach achieves better performance than the compared clustering method.

  13. One-dimensional Gromov minimal filling problem

    NASA Astrophysics Data System (ADS)

    Ivanov, Alexandr O.; Tuzhilin, Alexey A.

    2012-05-01

    The paper is devoted to a new branch in the theory of one-dimensional variational problems with branching extremals, the investigation of one-dimensional minimal fillings introduced by the authors. On the one hand, this problem is a one-dimensional version of a generalization of Gromov's minimal fillings problem to the case of stratified manifolds. On the other hand, this problem is interesting in itself and also can be considered as a generalization of another classical problem, the Steiner problem on the construction of a shortest network connecting a given set of terminals. Besides the statement of the problem, we discuss several properties of the minimal fillings and state several conjectures. Bibliography: 38 titles.

  14. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

    PubMed

    Wang, Xueyi

    2012-02-08

    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

  15. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  16. Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters

    NASA Astrophysics Data System (ADS)

    Mahmoud, E.; Shoukry, A.; Takey, A.

    2018-04-01

    Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values

  17. On the complexity of some quadratic Euclidean 2-clustering problems

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Pyatkin, A. V.

    2016-03-01

    Some problems of partitioning a finite set of points of Euclidean space into two clusters are considered. In these problems, the following criteria are minimized: (1) the sum over both clusters of the sums of squared pairwise distances between the elements of the cluster and (2) the sum of the (multiplied by the cardinalities of the clusters) sums of squared distances from the elements of the cluster to its geometric center, where the geometric center (or centroid) of a cluster is defined as the mean value of the elements in that cluster. Additionally, another problem close to (2) is considered, where the desired center of one of the clusters is given as input, while the center of the other cluster is unknown (is the variable to be optimized) as in problem (2). Two variants of the problems are analyzed, in which the cardinalities of the clusters are (1) parts of the input or (2) optimization variables. It is proved that all the considered problems are strongly NP-hard and that, in general, there is no fully polynomial-time approximation scheme for them (unless P = NP).

  18. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

    PubMed Central

    Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.

    2017-01-01

    High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787

  19. Membership determination of open clusters based on a spectral clustering method

    NASA Astrophysics Data System (ADS)

    Gao, Xin-Hua

    2018-06-01

    We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.

  20. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  1. Creating Quasi Two-Dimensional Cluster-Assembled Materials through Self-Assembly of a Janus Polyoxometalate-Silsesquioxane Co-Cluster.

    PubMed

    Wu, Han; Zhang, Yu-Qi; Hu, Min-Biao; Ren, Li-Jun; Lin, Yue; Wang, Wei

    2017-05-30

    Clusters are an important class of nanoscale molecules or superatoms that exhibit an amazing diversity in structure, chemical composition, shape, and functionality. Assembling two types of clusters is creating emerging cluster-assembled materials (CAMs). In this paper, we report an effective approach to produce quasi two-dimensional (2D) CAMs of two types of spherelike clusters, polyhedral oligomeric silsesquioxanes (POSS), and polyoxometalates (POM). To avoid macrophase separation between the two clusters, they are covalently linked to form a POM-POSS cocluster with Janus characteristics and a dumbbell shape. This Janus characteristics enables the cocluster to self-assemble into diverse nanoaggregates, as conventional amphiphilic molecules and macromolecules do, in selective solvents. In our study, we obtained micelles, vesicles, nanosheets, and nanoribbons by tuning the n-hexane content in mixed solvents of acetone and n-hexane. Ordered packing of clusters in the nanosheets and nanoribbons were directly visualized using high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) technique. We infer that the increase of packing order results in the vesicle-to-sheet transition and the change in packing mode causes the sheet-to-ribbon transitions. Our findings have verified the effectivity of creating quasi 2D cluster-assembled materials though the cocluster self-assembly as a new approach to produce novel CAMs.

  2. Clustering and assembly dynamics of a one-dimensional microphase former.

    PubMed

    Hu, Yi; Charbonneau, Patrick

    2018-05-23

    Both ordered and disordered microphases ubiquitously form in suspensions of particles that interact through competing short-range attraction and long-range repulsion (SALR). While ordered microphases are more appealing materials targets, understanding the rich structural and dynamical properties of their disordered counterparts is essential to controlling their mesoscale assembly. Here, we study the disordered regime of a one-dimensional (1D) SALR model, whose simplicity enables detailed analysis by transfer matrices and Monte Carlo simulations. We first characterize the signature of the clustering process on macroscopic observables, and then assess the equilibration dynamics of various simulation algorithms. We notably find that cluster moves markedly accelerate the mixing time, but that event chains are of limited help in the clustering regime. These insights will inspire further study of three-dimensional microphase formers.

  3. High-dimensional neural network potentials for solvation: The case of protonated water clusters in helium.

    PubMed

    Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik

    2018-03-14

    The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol -1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.

  4. High-dimensional neural network potentials for solvation: The case of protonated water clusters in helium

    NASA Astrophysics Data System (ADS)

    Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik

    2018-03-01

    The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol-1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.

  5. Multi-Dimensional, Non-Pyrolyzing Ablation Test Problems

    NASA Technical Reports Server (NTRS)

    Risch, Tim; Kostyk, Chris

    2016-01-01

    Non-pyrolyzingcarbonaceous materials represent a class of candidate material for hypersonic vehicle components providing both structural and thermal protection system capabilities. Two problems relevant to this technology are presented. The first considers the one-dimensional ablation of a carbon material subject to convective heating. The second considers two-dimensional conduction in a rectangular block subject to radiative heating. Surface thermochemistry for both problems includes finite-rate surface kinetics at low temperatures, diffusion limited ablation at intermediate temperatures, and vaporization at high temperatures. The first problem requires the solution of both the steady-state thermal profile with respect to the ablating surface and the transient thermal history for a one-dimensional ablating planar slab with temperature-dependent material properties. The slab front face is convectively heated and also reradiates to a room temperature environment. The back face is adiabatic. The steady-state temperature profile and steady-state mass loss rate should be predicted. Time-dependent front and back face temperature, surface recession and recession rate along with the final temperature profile should be predicted for the time-dependent solution. The second problem requires the solution for the transient temperature history for an ablating, two-dimensional rectangular solid with anisotropic, temperature-dependent thermal properties. The front face is radiatively heated, convectively cooled, and also reradiates to a room temperature environment. The back face and sidewalls are adiabatic. The solution should include the following 9 items: final surface recession profile, time-dependent temperature history of both the front face and back face at both the centerline and sidewall, as well as the time-dependent surface recession and recession rate on the front face at both the centerline and sidewall. The results of the problems from all submitters will be

  6. Addressing Curse of Dimensionality in Sensitivity Analysis: How Can We Handle High-Dimensional Problems?

    NASA Astrophysics Data System (ADS)

    Safaei, S.; Haghnegahdar, A.; Razavi, S.

    2016-12-01

    Complex environmental models are now the primary tool to inform decision makers for the current or future management of environmental resources under the climate and environmental changes. These complex models often contain a large number of parameters that need to be determined by a computationally intensive calibration procedure. Sensitivity analysis (SA) is a very useful tool that not only allows for understanding the model behavior, but also helps in reducing the number of calibration parameters by identifying unimportant ones. The issue is that most global sensitivity techniques are highly computationally demanding themselves for generating robust and stable sensitivity metrics over the entire model response surface. Recently, a novel global sensitivity analysis method, Variogram Analysis of Response Surfaces (VARS), is introduced that can efficiently provide a comprehensive assessment of global sensitivity using the Variogram concept. In this work, we aim to evaluate the effectiveness of this highly efficient GSA method in saving computational burden, when applied to systems with extra-large number of input factors ( 100). We use a test function and a hydrological modelling case study to demonstrate the capability of VARS method in reducing problem dimensionality by identifying important vs unimportant input factors.

  7. Effects of cluster location and cluster distribution on performance on the traveling salesman problem.

    PubMed

    MacGregor, James N

    2015-10-01

    Research on human performance in solving traveling salesman problems typically uses point sets as stimuli, and most models have proposed a processing stage at which stimulus dots are clustered. However, few empirical studies have investigated the effects of clustering on performance. In one recent study, researchers compared the effects of clustered, random, and regular stimuli, and concluded that clustering facilitates performance (Dry, Preiss, & Wagemans, 2012). Another study suggested that these results may have been influenced by the location rather than the degree of clustering (MacGregor, 2013). Two experiments are reported that mark an attempt to disentangle these factors. The first experiment tested several combinations of degree of clustering and cluster location, and revealed mixed evidence that clustering influences performance. In a second experiment, both factors were varied independently, showing that they interact. The results are discussed in terms of the importance of clustering effects, in particular, and perceptual factors, in general, during performance of the traveling salesman problem.

  8. ICM: a web server for integrated clustering of multi-dimensional biomedical data.

    PubMed

    He, Song; He, Haochen; Xu, Wenjian; Huang, Xin; Jiang, Shuai; Li, Fei; He, Fuchu; Bo, Xiaochen

    2016-07-08

    Large-scale efforts for parallel acquisition of multi-omics profiling continue to generate extensive amounts of multi-dimensional biomedical data. Thus, integrated clustering of multiple types of omics data is essential for developing individual-based treatments and precision medicine. However, while rapid progress has been made, methods for integrated clustering are lacking an intuitive web interface that facilitates the biomedical researchers without sufficient programming skills. Here, we present a web tool, named Integrated Clustering of Multi-dimensional biomedical data (ICM), that provides an interface from which to fuse, cluster and visualize multi-dimensional biomedical data and knowledge. With ICM, users can explore the heterogeneity of a disease or a biological process by identifying subgroups of patients. The results obtained can then be interactively modified by using an intuitive user interface. Researchers can also exchange the results from ICM with collaborators via a web link containing a Project ID number that will directly pull up the analysis results being shared. ICM also support incremental clustering that allows users to add new sample data into the data of a previous study to obtain a clustering result. Currently, the ICM web server is available with no login requirement and at no cost at http://biotech.bmi.ac.cn/icm/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering

    PubMed Central

    2013-01-01

    Background The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. Results In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Conclusions Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship. PMID:23845024

  10. Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering.

    PubMed

    Ji, Shuiwang

    2013-07-11

    The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship.

  11. High-dimensional vector semantics

    NASA Astrophysics Data System (ADS)

    Andrecut, M.

    In this paper we explore the “vector semantics” problem from the perspective of “almost orthogonal” property of high-dimensional random vectors. We show that this intriguing property can be used to “memorize” random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering.

  12. Metal-superconductor transition in low-dimensional superconducting clusters embedded in two-dimensional electron systems

    NASA Astrophysics Data System (ADS)

    Bucheli, D.; Caprara, S.; Castellani, C.; Grilli, M.

    2013-02-01

    Motivated by recent experimental data on thin film superconductors and oxide interfaces, we propose a random-resistor network apt to describe the occurrence of a metal-superconductor transition in a two-dimensional electron system with disorder on the mesoscopic scale. We consider low-dimensional (e.g. filamentary) structures of a superconducting cluster embedded in the two-dimensional network and we explore the separate effects and the interplay of the superconducting structure and of the statistical distribution of local critical temperatures. The thermal evolution of the resistivity is determined by a numerical calculation of the random-resistor network and, for comparison, a mean-field approach called effective medium theory (EMT). Our calculations reveal the relevance of the distribution of critical temperatures for clusters with low connectivity. In addition, we show that the presence of spatial correlations requires a modification of standard EMT to give qualitative agreement with the numerical results. Applying the present approach to an LaTiO3/SrTiO3 oxide interface, we find that the measured resistivity curves are compatible with a network of spatially dense but loosely connected superconducting islands.

  13. HSTLBO: A hybrid algorithm based on Harmony Search and Teaching-Learning-Based Optimization for complex high-dimensional optimization problems.

    PubMed

    Tuo, Shouheng; Yong, Longquan; Deng, Fang'an; Li, Yanhai; Lin, Yong; Lu, Qiuju

    2017-01-01

    Harmony Search (HS) and Teaching-Learning-Based Optimization (TLBO) as new swarm intelligent optimization algorithms have received much attention in recent years. Both of them have shown outstanding performance for solving NP-Hard optimization problems. However, they also suffer dramatic performance degradation for some complex high-dimensional optimization problems. Through a lot of experiments, we find that the HS and TLBO have strong complementarity each other. The HS has strong global exploration power but low convergence speed. Reversely, the TLBO has much fast convergence speed but it is easily trapped into local search. In this work, we propose a hybrid search algorithm named HSTLBO that merges the two algorithms together for synergistically solving complex optimization problems using a self-adaptive selection strategy. In the HSTLBO, both HS and TLBO are modified with the aim of balancing the global exploration and exploitation abilities, where the HS aims mainly to explore the unknown regions and the TLBO aims to rapidly exploit high-precision solutions in the known regions. Our experimental results demonstrate better performance and faster speed than five state-of-the-art HS variants and show better exploration power than five good TLBO variants with similar run time, which illustrates that our method is promising in solving complex high-dimensional optimization problems. The experiment on portfolio optimization problems also demonstrate that the HSTLBO is effective in solving complex read-world application.

  14. High-Performance Computing and Four-Dimensional Data Assimilation: The Impact on Future and Current Problems

    NASA Technical Reports Server (NTRS)

    Makivic, Miloje S.

    1996-01-01

    This is the final technical report for the project entitled: "High-Performance Computing and Four-Dimensional Data Assimilation: The Impact on Future and Current Problems", funded at NPAC by the DAO at NASA/GSFC. First, the motivation for the project is given in the introductory section, followed by the executive summary of major accomplishments and the list of project-related publications. Detailed analysis and description of research results is given in subsequent chapters and in the Appendix.

  15. Clustering cancer gene expression data by projective clustering ensemble

    PubMed Central

    Yu, Xianxue; Yu, Guoxian

    2017-01-01

    Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920

  16. Prescribed nanoparticle cluster architectures and low-dimensional arrays built using octahedral DNA origami frames

    NASA Astrophysics Data System (ADS)

    Tian, Ye; Wang, Tong; Liu, Wenyan; Xin, Huolin L.; Li, Huilin; Ke, Yonggang; Shih, William M.; Gang, Oleg

    2015-07-01

    Three-dimensional mesoscale clusters that are formed from nanoparticles spatially arranged in pre-determined positions can be thought of as mesoscale analogues of molecules. These nanoparticle architectures could offer tailored properties due to collective effects, but developing a general platform for fabricating such clusters is a significant challenge. Here, we report a strategy for assembling three-dimensional nanoparticle clusters that uses a molecular frame designed with encoded vertices for particle placement. The frame is a DNA origami octahedron and can be used to fabricate clusters with various symmetries and particle compositions. Cryo-electron microscopy is used to uncover the structure of the DNA frame and to reveal that the nanoparticles are spatially coordinated in the prescribed manner. We show that the DNA frame and one set of nanoparticles can be used to create nanoclusters with different chiroptical activities. We also show that the octahedra can serve as programmable interparticle linkers, allowing one- and two-dimensional arrays to be assembled with designed particle arrangements.

  17. HSTLBO: A hybrid algorithm based on Harmony Search and Teaching-Learning-Based Optimization for complex high-dimensional optimization problems

    PubMed Central

    Tuo, Shouheng; Yong, Longquan; Deng, Fang’an; Li, Yanhai; Lin, Yong; Lu, Qiuju

    2017-01-01

    Harmony Search (HS) and Teaching-Learning-Based Optimization (TLBO) as new swarm intelligent optimization algorithms have received much attention in recent years. Both of them have shown outstanding performance for solving NP-Hard optimization problems. However, they also suffer dramatic performance degradation for some complex high-dimensional optimization problems. Through a lot of experiments, we find that the HS and TLBO have strong complementarity each other. The HS has strong global exploration power but low convergence speed. Reversely, the TLBO has much fast convergence speed but it is easily trapped into local search. In this work, we propose a hybrid search algorithm named HSTLBO that merges the two algorithms together for synergistically solving complex optimization problems using a self-adaptive selection strategy. In the HSTLBO, both HS and TLBO are modified with the aim of balancing the global exploration and exploitation abilities, where the HS aims mainly to explore the unknown regions and the TLBO aims to rapidly exploit high-precision solutions in the known regions. Our experimental results demonstrate better performance and faster speed than five state-of-the-art HS variants and show better exploration power than five good TLBO variants with similar run time, which illustrates that our method is promising in solving complex high-dimensional optimization problems. The experiment on portfolio optimization problems also demonstrate that the HSTLBO is effective in solving complex read-world application. PMID:28403224

  18. Approximate cluster analysis method and three-dimensional diagram of optical characteristics of lunar surface

    NASA Astrophysics Data System (ADS)

    Yevsyukov, N. N.

    1985-09-01

    An approximate isolation algorithm for the isolation of multidimensional clusters is developed and applied in the construction of a three-dimensional diagram of the optical characteristics of the lunar surface. The method is somewhat analogous to that of Koontz and Fukunaga (1972) and involves isolating two-dimensional clusters, adding a new characteristic, and linearizing, a cycle which is repeated a limited number of times. The lunar-surface parameters analyzed are the 620-nm albedo, the 620/380-nm color index, and the 950/620-nm index. The results are presented graphically; the reliability of the cluster-isolation process is discussed; and some correspondences between known lunar morphology and the cluster maps are indicated.

  19. Single exposure three-dimensional imaging of dusty plasma clusters.

    PubMed

    Hartmann, Peter; Donkó, István; Donkó, Zoltán

    2013-02-01

    We have worked out the details of a single camera, single exposure method to perform three-dimensional imaging of a finite particle cluster. The procedure is based on the plenoptic imaging principle and utilizes a commercial Lytro light field still camera. We demonstrate the capabilities of our technique on a single layer particle cluster in a dusty plasma, where the camera is aligned and inclined at a small angle to the particle layer. The reconstruction of the third coordinate (depth) is found to be accurate and even shadowing particles can be identified.

  20. Semi-Supervised Clustering for High-Dimensional and Sparse Features

    ERIC Educational Resources Information Center

    Yan, Su

    2010-01-01

    Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…

  1. The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

    PubMed Central

    Ahmed, Zakir Hussain

    2014-01-01

    The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148

  2. A cluster-analytic study of substance problems and mental health among street youths.

    PubMed

    Adlaf, E M; Zdanowicz, Y M

    1999-11-01

    Based on a cluster analysis of 211 street youths aged 13-24 years interviewed in 1992 in Toronto, Ontario, Canada, we describe the configuration of mental health and substance use outcomes. Eight clusters were suggested: Entrepreneurs (n = 19) were frequently involved in delinquent activity and were highly entrenched in the street lifestyle; Drifters (n = 35) had infrequent social contact, displayed lower than average family dysfunction, and were not highly entrenched in the street lifestyle; Partiers (n = 40) were distinguished by their recreational motivation for alcohol and drug use and their below average entrenchment in the street lifestyle; Retreatists (n = 32) were distinguished by their high coping motivation for substance use; Fringers (n = 48) were involved marginally in the street lifestyle and showed lower than average family dysfunction; Transcenders (n = 21), despite above average physical and sexual abuse, reported below average mental health or substance use problems; Vulnerables (n = 12) were characterized by high family dysfunction (including physical and sexual abuse), elevated mental health outcomes, and use of alcohol and other drugs motivated by coping and escapism; Sex Workers (n = 4) were highly entrenched in the street lifestyle and reported frequent commercial sexual work, above average sexual abuse, and extensive use of crack cocaine. The results showed that distress, self-esteem, psychotic thoughts, attempted suicide, alcohol problems, drug problems, dual substance problems, and dual disorders varied significantly among the eight clusters. Overall, the findings suggest the need for differential programming. The data showed that risk factors, mental health, and substance use outcomes vary among this population. Also, for some the web of mental health and substance use problems is inseparable.

  3. Supporting Dynamic Quantization for High-Dimensional Data Analytics.

    PubMed

    Guzun, Gheorghi; Canahuate, Guadalupe

    2017-05-01

    Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.

  4. NP-hardness of the cluster minimization problem revisited

    NASA Astrophysics Data System (ADS)

    Adib, Artur B.

    2005-10-01

    The computational complexity of the 'cluster minimization problem' is revisited (Wille and Vennik 1985 J. Phys. A: Math. Gen. 18 L419). It is argued that the original NP-hardness proof does not apply to pairwise potentials of physical interest, such as those that depend on the geometric distance between the particles. A geometric analogue of the original problem is formulated, and a new proof for such potentials is provided by polynomial time transformation from the independent set problem for unit disk graphs. Limitations of this formulation are pointed out, and new subproblems that bear more direct consequences to the numerical study of clusters are suggested.

  5. Penalized gaussian process regression and classification for high-dimensional nonlinear data.

    PubMed

    Yi, G; Shi, J Q; Choi, T

    2011-12-01

    The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.

  6. A numerical algorithm for optimal feedback gains in high dimensional LQR problems

    NASA Technical Reports Server (NTRS)

    Banks, H. T.; Ito, K.

    1986-01-01

    A hybrid method for computing the feedback gains in linear quadratic regulator problems is proposed. The method, which combines the use of a Chandrasekhar type system with an iteration of the Newton-Kleinman form with variable acceleration parameter Smith schemes, is formulated so as to efficiently compute directly the feedback gains rather than solutions of an associated Riccati equation. The hybrid method is particularly appropriate when used with large dimensional systems such as those arising in approximating infinite dimensional (distributed parameter) control systems (e.g., those governed by delay-differential and partial differential equations). Computational advantage of the proposed algorithm over the standard eigenvector (Potter, Laub-Schur) based techniques are discussed and numerical evidence of the efficacy of our ideas presented.

  7. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    PubMed

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

  8. The GALAH survey: chemical tagging of star clusters and new members in the Pleiades

    NASA Astrophysics Data System (ADS)

    Kos, Janez; Bland-Hawthorn, Joss; Freeman, Ken; Buder, Sven; Traven, Gregor; De Silva, Gayandhi M.; Sharma, Sanjib; Asplund, Martin; Duong, Ly; Lin, Jane; Lind, Karin; Martell, Sarah; Simpson, Jeffrey D.; Stello, Dennis; Zucker, Daniel B.; Zwitter, Tomaž; Anguiano, Borja; Da Costa, Gary; D'Orazi, Valentina; Horner, Jonathan; Kafle, Prajwal R.; Lewis, Geraint; Munari, Ulisse; Nataf, David M.; Ness, Melissa; Reid, Warren; Schlesinger, Katie; Ting, Yuan-Sen; Wyse, Rosemary

    2018-02-01

    The technique of chemical tagging uses the elemental abundances of stellar atmospheres to 'reconstruct' chemically homogeneous star clusters that have long since dispersed. The GALAH spectroscopic survey - which aims to observe one million stars using the Anglo-Australian Telescope - allows us to measure up to 30 elements or dimensions in the stellar chemical abundance space, many of which are not independent. How to find clustering reliably in a noisy high-dimensional space is a difficult problem that remains largely unsolved. Here, we explore t-distributed stochastic neighbour embedding (t-SNE) - which identifies an optimal mapping of a high-dimensional space into fewer dimensions - whilst conserving the original clustering information. Typically, the projection is made to a 2D space to aid recognition of clusters by eye. We show that this method is a reliable tool for chemical tagging because it can: (i) resolve clustering in chemical space alone, (ii) recover known open and globular clusters with high efficiency and low contamination, and (iii) relate field stars to known clusters. t-SNE also provides a useful visualization of a high-dimensional space. We demonstrate the method on a data set of 13 abundances measured in the spectra of 187 000 stars by the GALAH survey. We recover seven of the nine observed clusters (six globular and three open clusters) in chemical space with minimal contamination from field stars and low numbers of outliers. With chemical tagging, we also identify two Pleiades supercluster members (which we confirm kinematically), one as far as 6° - one tidal radius away from the cluster centre.

  9. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    PubMed Central

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  10. Convex clustering: an attractive alternative to hierarchical clustering.

    PubMed

    Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth

    2015-05-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.

  11. High- and low-level hierarchical classification algorithm based on source separation process

    NASA Astrophysics Data System (ADS)

    Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber

    2016-10-01

    High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance

  12. Greedy subspace clustering.

    DOT National Transportation Integrated Search

    2016-09-01

    We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets ...

  13. Three-Dimensional Modeling of Fracture Clusters in Geothermal Reservoirs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghassemi, Ahmad

    The objective of this is to develop a 3-D numerical model for simulating mode I, II, and III (tensile, shear, and out-of-plane) propagation of multiple fractures and fracture clusters to accurately predict geothermal reservoir stimulation using the virtual multi-dimensional internal bond (VMIB). Effective development of enhanced geothermal systems can significantly benefit from improved modeling of hydraulic fracturing. In geothermal reservoirs, where the temperature can reach or exceed 350oC, thermal and poro-mechanical processes play an important role in fracture initiation and propagation. In this project hydraulic fracturing of hot subsurface rock mass will be numerically modeled by extending the virtual multiplemore » internal bond theory and implementing it in a finite element code, WARP3D, a three-dimensional finite element code for solid mechanics. The new constitutive model along with the poro-thermoelastic computational algorithms will allow modeling the initiation and propagation of clusters of fractures, and extension of pre-existing fractures. The work will enable the industry to realistically model stimulation of geothermal reservoirs. The project addresses the Geothermal Technologies Office objective of accurately predicting geothermal reservoir stimulation (GTO technology priority item). The project goal will be attained by: (i) development of the VMIB method for application to 3D analysis of fracture clusters; (ii) development of poro- and thermoelastic material sub-routines for use in 3D finite element code WARP3D; (iii) implementation of VMIB and the new material routines in WARP3D to enable simulation of clusters of fractures while accounting for the effects of the pore pressure, thermal stress and inelastic deformation; (iv) simulation of 3D fracture propagation and coalescence and formation of clusters, and comparison with laboratory compression tests; and (v) application of the model to interpretation of injection experiments (planned by

  14. Approximation algorithm for the problem of partitioning a sequence into clusters

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Mikhailova, L. V.; Khamidullin, S. A.; Khandeev, V. I.

    2017-08-01

    We consider the problem of partitioning a finite sequence of Euclidean points into a given number of clusters (subsequences) using the criterion of the minimal sum (over all clusters) of intercluster sums of squared distances from the elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is at the origin, while the center of each of the other clusters is unknown and determined as the mean value over all elements in this cluster. Additionally, the partition obeys two structural constraints on the indices of sequence elements contained in the clusters with unknown centers: (1) the concatenation of the indices of elements in these clusters is an increasing sequence, and (2) the difference between an index and the preceding one is bounded above and below by prescribed constants. It is shown that this problem is strongly NP-hard. A 2-approximation algorithm is constructed that is polynomial-time for a fixed number of clusters.

  15. Data-driven cluster reinforcement and visualization in sparsely-matched self-organizing maps.

    PubMed

    Manukyan, Narine; Eppstein, Margaret J; Rizzo, Donna M

    2012-05-01

    A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. However, when there are more neurons than input patterns, it can be challenging to interpret the results, due to diffuse cluster boundaries and limitations of current methods for displaying interneuron distances. In this brief, we introduce a new cluster reinforcement (CR) phase for sparsely-matched SOMs. The CR phase amplifies within-cluster similarity in an unsupervised, data-driven manner. Discontinuities in the resulting map correspond to between-cluster distances and are stored in a boundary (B) matrix. We describe a new hierarchical visualization of cluster boundaries displayed directly on feature maps, which requires no further clustering beyond what was implicitly accomplished during self-organization in SOM training. We use a synthetic benchmark problem and previously published microbial community profile data to demonstrate the benefits of the proposed methods.

  16. The void spectrum in two-dimensional numerical simulations of gravitational clustering

    NASA Technical Reports Server (NTRS)

    Kauffmann, Guinevere; Melott, Adrian L.

    1992-01-01

    An algorithm for deriving a spectrum of void sizes from two-dimensional high-resolution numerical simulations of gravitational clustering is tested, and it is verified that it produces the correct results where those results can be anticipated. The method is used to study the growth of voids as clustering proceeds. It is found that the most stable indicator of the characteristic void 'size' in the simulations is the mean fractional area covered by voids of diameter d, in a density field smoothed at its correlation length. Very accurate scaling behavior is found in power-law numerical models as they evolve. Eventually, this scaling breaks down as the nonlinearity reaches larger scales. It is shown that this breakdown is a manifestation of the undesirable effect of boundary conditions on simulations, even with the very large dynamic range possible here. A simple criterion is suggested for deciding when simulations with modest large-scale power may systematically underestimate the frequency of larger voids.

  17. a Probabilistic Embedding Clustering Method for Urban Structure Detection

    NASA Astrophysics Data System (ADS)

    Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.

    2017-09-01

    Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  18. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a

  19. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    PubMed Central

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-01-01

    Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124

  20. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

    PubMed

    Dazard, Jean-Eudes; Rao, J Sunil

    2012-07-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

  1. Ligand combination strategy for the preparation of novel low-dimensional and open-framework metal cluster materials

    NASA Astrophysics Data System (ADS)

    Anokhina, Ekaterina V.

    Low-dimensional and open-framework materials containing transition metals have a wide range of applications in redox catalysis, solid-state batteries, and electronic and magnetic devices. This dissertation reports on research carried out with the goal to develop a strategy for the preparation of low-dimensional and open-framework materials using octahedral metal clusters as building blocks. Our approach takes its roots from crystal engineering principles where the desired framework topologies are achieved through building block design. The key idea of this work is to induce directional bonding preferences in the cluster units using a combination of ligands with a large difference in charge density. This investigation led to the preparation and characterization of a new family of niobium oxychloride cluster compounds with original structure types exhibiting 1ow-dimensional or open-framework character. Most of these materials have framework topologies unprecedented in compounds containing octahedral clusters. Comparative analysis of their structural features indicates that the novel cluster connectivity patterns in these systems are the result of complex interplay between the effects of anisotropic ligand arrangement in the cluster unit and optimization of ligand-counterion electrostatic interactions. The important role played by these factors sets niobium oxychloride systems apart from cluster compounds with one ligand type or statistical ligand distribution where the main structure-determining factor is the total number of ligands. These results provide a blueprint for expanding the ligand combination strategy to other transition metal cluster systems and for the future rational design of cluster-based materials.

  2. Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

    NASA Astrophysics Data System (ADS)

    Li, Weixuan; Lin, Guang; Li, Bing

    2016-09-01

    Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.

  3. Feature extraction and classification algorithms for high dimensional data

    NASA Technical Reports Server (NTRS)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized

  4. Solving time-dependent two-dimensional eddy current problems

    NASA Technical Reports Server (NTRS)

    Lee, Min Eig; Hariharan, S. I.; Ida, Nathan

    1988-01-01

    Results of transient eddy current calculations are reported. For simplicity, a two-dimensional transverse magnetic field which is incident on an infinitely long conductor is considered. The conductor is assumed to be a good but not perfect conductor. The resulting problem is an interface initial boundary value problem with the boundary of the conductor being the interface. A finite difference method is used to march the solution explicitly in time. The method is shown. Treatment of appropriate radiation conditions is given special consideration. Results are validated with approximate analytic solutions. Two stringent test cases of high and low frequency incident waves are considered to validate the results.

  5. Hypergraph-based anomaly detection of high-dimensional co-occurrences.

    PubMed

    Silva, Jorge; Willett, Rebecca

    2009-03-01

    This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.

  6. The Effects of Cumulative Violence Clusters on Young Mothers' School Participation: Examining Attention and Behavior Problems as Mediators.

    PubMed

    Kennedy, Angie C; Adams, Adrienne E

    2016-04-01

    Using a cluster analysis approach with a sample of 205 young mothers recruited from community sites in an urban Midwestern setting, we examined the effects of cumulative violence exposure (community violence exposure, witnessing intimate partner violence, physical abuse by a caregiver, and sexual victimization, all with onset prior to age 13) on school participation, as mediated by attention and behavior problems in school. We identified five clusters of cumulative exposure, and found that the HiAll cluster (high levels of exposure to all four types) consistently fared the worst, with significantly higher attention and behavior problems, and lower school participation, in comparison with the LoAll cluster (low levels of exposure to all types). Behavior problems were a significant mediator of the effects of cumulative violence exposure on school participation, but attention problems were not. © The Author(s) 2014.

  7. Constrained optimization by radial basis function interpolation for high-dimensional expensive black-box problems with infeasible initial points

    NASA Astrophysics Data System (ADS)

    Regis, Rommel G.

    2014-02-01

    This article develops two new algorithms for constrained expensive black-box optimization that use radial basis function surrogates for the objective and constraint functions. These algorithms are called COBRA and Extended ConstrLMSRBF and, unlike previous surrogate-based approaches, they can be used for high-dimensional problems where all initial points are infeasible. They both follow a two-phase approach where the first phase finds a feasible point while the second phase improves this feasible point. COBRA and Extended ConstrLMSRBF are compared with alternative methods on 20 test problems and on the MOPTA08 benchmark automotive problem (D.R. Jones, Presented at MOPTA 2008), which has 124 decision variables and 68 black-box inequality constraints. The alternatives include a sequential penalty derivative-free algorithm, a direct search method with kriging surrogates, and two multistart methods. Numerical results show that COBRA algorithms are competitive with Extended ConstrLMSRBF and they generally outperform the alternatives on the MOPTA08 problem and most of the test problems.

  8. Three-Dimensional Computer-Aided Detection of Microcalcification Clusters in Digital Breast Tomosynthesis.

    PubMed

    Jeong, Ji-Wook; Chae, Seung-Hoon; Chae, Eun Young; Kim, Hak Hee; Choi, Young-Wook; Lee, Sooyeul

    2016-01-01

    We propose computer-aided detection (CADe) algorithm for microcalcification (MC) clusters in reconstructed digital breast tomosynthesis (DBT) images. The algorithm consists of prescreening, MC detection, clustering, and false-positive (FP) reduction steps. The DBT images containing the MC-like objects were enhanced by a multiscale Hessian-based three-dimensional (3D) objectness response function and a connected-component segmentation method was applied to extract the cluster seed objects as potential clustering centers of MCs. Secondly, a signal-to-noise ratio (SNR) enhanced image was also generated to detect the individual MC candidates and prescreen the MC-like objects. Each cluster seed candidate was prescreened by counting neighboring individual MC candidates nearby the cluster seed object according to several microcalcification clustering criteria. As a second step, we introduced bounding boxes for the accepted seed candidate, clustered all the overlapping cubes, and examined. After the FP reduction step, the average number of FPs per case was estimated to be 2.47 per DBT volume with a sensitivity of 83.3%.

  9. Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Weixuan; Lin, Guang; Li, Bing

    2016-09-01

    A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate ofmore » $$O(n^{-1/2})$$, the corresponding IRUQ converges at $$O(n^{-1})$$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.« less

  10. Locating landmarks on high-dimensional free energy surfaces

    PubMed Central

    Chen, Ming; Yu, Tang-Qing; Tuckerman, Mark E.

    2015-01-01

    Coarse graining of complex systems possessing many degrees of freedom can often be a useful approach for analyzing and understanding key features of these systems in terms of just a few variables. The relevant energy landscape in a coarse-grained description is the free energy surface as a function of the coarse-grained variables, which, despite the dimensional reduction, can still be an object of high dimension. Consequently, navigating and exploring this high-dimensional free energy surface is a nontrivial task. In this paper, we use techniques from multiscale modeling, stochastic optimization, and machine learning to devise a strategy for locating minima and saddle points (termed “landmarks”) on a high-dimensional free energy surface “on the fly” and without requiring prior knowledge of or an explicit form for the surface. In addition, we propose a compact graph representation of the landmarks and connections between them, and we show that the graph nodes can be subsequently analyzed and clustered based on key attributes that elucidate important properties of the system. Finally, we show that knowledge of landmark locations allows for the efficient determination of their relative free energies via enhanced sampling techniques. PMID:25737545

  11. High-resolution two dimensional advective transport

    USGS Publications Warehouse

    Smith, P.E.; Larock, B.E.

    1989-01-01

    The paper describes a two-dimensional high-resolution scheme for advective transport that is based on a Eulerian-Lagrangian method with a flux limiter. The scheme is applied to the problem of pure-advection of a rotated Gaussian hill and shown to preserve the monotonicity property of the governing conservation law.

  12. Accelerating three-dimensional FDTD calculations on GPU clusters for electromagnetic field simulation.

    PubMed

    Nagaoka, Tomoaki; Watanabe, Soichi

    2012-01-01

    Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.

  13. High dimensional feature reduction via projection pursuit

    NASA Technical Reports Server (NTRS)

    Jimenez, Luis; Landgrebe, David

    1994-01-01

    The recent development of more sophisticated remote sensing systems enables the measurement of radiation in many more spectral intervals than previously possible. An example of that technology is the AVIRIS system, which collects image data in 220 bands. As a result of this, new algorithms must be developed in order to analyze the more complex data effectively. Data in a high dimensional space presents a substantial challenge, since intuitive concepts valid in a 2-3 dimensional space to not necessarily apply in higher dimensional spaces. For example, high dimensional space is mostly empty. This results from the concentration of data in the corners of hypercubes. Other examples may be cited. Such observations suggest the need to project data to a subspace of a much lower dimension on a problem specific basis in such a manner that information is not lost. Projection Pursuit is a technique that will accomplish such a goal. Since it processes data in lower dimensions, it should avoid many of the difficulties of high dimensional spaces. In this paper, we begin the investigation of some of the properties of Projection Pursuit for this purpose.

  14. Prescribed nanoparticle cluster architectures and low-dimensional arrays built using octahedral DNA origami frames

    DOE PAGES

    Tian, Ye; Wang, Tong; Liu, Wenyan; ...

    2015-05-25

    Three-dimensional mesoscale clusters that are formed from nanoparticles spatially arranged in pre-determined positions can be thought of as mesoscale analogues of molecules. These nanoparticle architectures could offer tailored properties due to collective effects, but developing a general platform for fabricating such clusters is a significant challenge. Here, we report a strategy for assembling 3D nanoparticle clusters that uses a molecular frame designed with encoded vertices for particle placement. The frame is a DNA origami octahedron and can be used to fabricate clusters with various symmetries and particle compositions. Cryo-electron microscopy is used to uncover the structure of the DNA framemore » and to reveal that the nanoparticles are spatially coordinated in the prescribed manner. We show that the DNA frame and one set of nanoparticles can be used to create nanoclusters with different chiroptical activities. We also show that the octahedra can serve as programmable interparticle linkers, allowing one- and two-dimensional arrays to be assembled that have designed particle arrangements.« less

  15. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data

    PubMed Central

    Dazard, Jean-Eudes; Rao, J. Sunil

    2012-01-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950

  16. Solution of the two-dimensional spectral factorization problem

    NASA Technical Reports Server (NTRS)

    Lawton, W. M.

    1985-01-01

    An approximation theorem is proven which solves a classic problem in two-dimensional (2-D) filter theory. The theorem shows that any continuous two-dimensional spectrum can be uniformly approximated by the squared modulus of a recursively stable finite trigonometric polynomial supported on a nonsymmetric half-plane.

  17. Solution methods for one-dimensional viscoelastic problems

    NASA Technical Reports Server (NTRS)

    Stubstad, John M.; Simitses, George J.

    1987-01-01

    A recently developed differential methodology for solution of one-dimensional nonlinear viscoelastic problems is presented. Using the example of an eccentrically loaded cantilever beam-column, the results from the differential formulation are compared to results generated using a previously published integral solution technique. It is shown that the results obtained from these distinct methodologies exhibit a surprisingly high degree of correlation with one another. A discussion of the various factors affecting the numerical accuracy and rate of convergence of these two procedures is also included. Finally, the influences of some 'higher order' effects, such as straining along the centroidal axis are discussed.

  18. A numerical algorithm for optimal feedback gains in high dimensional linear quadratic regulator problems

    NASA Technical Reports Server (NTRS)

    Banks, H. T.; Ito, K.

    1991-01-01

    A hybrid method for computing the feedback gains in linear quadratic regulator problem is proposed. The method, which combines use of a Chandrasekhar type system with an iteration of the Newton-Kleinman form with variable acceleration parameter Smith schemes, is formulated to efficiently compute directly the feedback gains rather than solutions of an associated Riccati equation. The hybrid method is particularly appropriate when used with large dimensional systems such as those arising in approximating infinite-dimensional (distributed parameter) control systems (e.g., those governed by delay-differential and partial differential equations). Computational advantages of the proposed algorithm over the standard eigenvector (Potter, Laub-Schur) based techniques are discussed, and numerical evidence of the efficacy of these ideas is presented.

  19. Swarm v2: highly-scalable and high-resolution amplicon clustering

    PubMed Central

    Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah

    2015-01-01

    Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks. PMID:26713226

  20. Swarm v2: highly-scalable and high-resolution amplicon clustering.

    PubMed

    Mahé, Frédéric; Rognes, Torbjørn; Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah

    2015-01-01

    Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.

  1. Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms

    NASA Astrophysics Data System (ADS)

    Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut

    1998-06-01

    X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.

  2. Discovering biclusters in gene expression data based on high-dimensional linear geometries

    PubMed Central

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2008-01-01

    Background In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. Results In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. Conclusion We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data

  3. Uniform high order spectral methods for one and two dimensional Euler equations

    NASA Technical Reports Server (NTRS)

    Cai, Wei; Shu, Chi-Wang

    1991-01-01

    Uniform high order spectral methods to solve multi-dimensional Euler equations for gas dynamics are discussed. Uniform high order spectral approximations with spectral accuracy in smooth regions of solutions are constructed by introducing the idea of the Essentially Non-Oscillatory (ENO) polynomial interpolations into the spectral methods. The authors present numerical results for the inviscid Burgers' equation, and for the one dimensional Euler equations including the interactions between a shock wave and density disturbance, Sod's and Lax's shock tube problems, and the blast wave problem. The interaction between a Mach 3 two dimensional shock wave and a rotating vortex is simulated.

  4. A Simple Algebraic Grid Adaptation Scheme with Applications to Two- and Three-dimensional Flow Problems

    NASA Technical Reports Server (NTRS)

    Hsu, Andrew T.; Lytle, John K.

    1989-01-01

    An algebraic adaptive grid scheme based on the concept of arc equidistribution is presented. The scheme locally adjusts the grid density based on gradients of selected flow variables from either finite difference or finite volume calculations. A user-prescribed grid stretching can be specified such that control of the grid spacing can be maintained in areas of known flowfield behavior. For example, the grid can be clustered near a wall for boundary layer resolution and made coarse near the outer boundary of an external flow. A grid smoothing technique is incorporated into the adaptive grid routine, which is found to be more robust and efficient than the weight function filtering technique employed by other researchers. Since the present algebraic scheme requires no iteration or solution of differential equations, the computer time needed for grid adaptation is trivial, making the scheme useful for three-dimensional flow problems. Applications to two- and three-dimensional flow problems show that a considerable improvement in flowfield resolution can be achieved by using the proposed adaptive grid scheme. Although the scheme was developed with steady flow in mind, it is a good candidate for unsteady flow computations because of its efficiency.

  5. Engineering two-photon high-dimensional states through quantum interference

    PubMed Central

    Zhang, Yingwen; Roux, Filippus S.; Konrad, Thomas; Agnew, Megan; Leach, Jonathan; Forbes, Andrew

    2016-01-01

    Many protocols in quantum science, for example, linear optical quantum computing, require access to large-scale entangled quantum states. Such systems can be realized through many-particle qubits, but this approach often suffers from scalability problems. An alternative strategy is to consider a lesser number of particles that exist in high-dimensional states. The spatial modes of light are one such candidate that provides access to high-dimensional quantum states, and thus they increase the storage and processing potential of quantum information systems. We demonstrate the controlled engineering of two-photon high-dimensional states entangled in their orbital angular momentum through Hong-Ou-Mandel interference. We prepare a large range of high-dimensional entangled states and implement precise quantum state filtering. We characterize the full quantum state before and after the filter, and are thus able to determine that only the antisymmetric component of the initial state remains. This work paves the way for high-dimensional processing and communication of multiphoton quantum states, for example, in teleportation beyond qubits. PMID:26933685

  6. Pattern of clustering of menopausal problems: A study with a Bengali Hindu ethnic group.

    PubMed

    Dasgupta, Doyel; Pal, Baidyanath; Ray, Subha

    2016-01-01

    We attempted to find out how menopausal problems cluster with each other. The study was conducted among a group of women belonging to a Bengali-speaking Hindu ethnic group of West Bengal, a state located in Eastern India. We recruited 1,400 participants for the study. Information on sociodemographic aspects and menopausal problems were collected from these participants with the help of a pretested questionnaire. Results of cluster analysis showed that vasomotor, vaginal, and urinary problems cluster together, separately from physical and psychosomatic problems.

  7. Effects of Cluster Location on Human Performance on the Traveling Salesperson Problem

    ERIC Educational Resources Information Center

    MacGregor, James N.

    2013-01-01

    Most models of human performance on the traveling salesperson problem involve clustering of nodes, but few empirical studies have examined effects of clustering in the stimulus array. A recent exception varied degree of clustering and concluded that the more clustered a stimulus array, the easier a TSP is to solve (Dry, Preiss, & Wagemans,…

  8. The method of approximate cluster analysis and the three-dimensional diagram of optical characteristics of the lunar surface

    NASA Astrophysics Data System (ADS)

    Evsyukov, N. N.

    1984-12-01

    An approximate isolation algorithm for the isolation of multidimensional clusters is developed and applied in the construction of a three-dimensional diagram of the optical characteristics of the lunar surface. The method is somewhat analogous to that of Koontz and Fukunaga (1972) and involves isolating two-dimensional clusters, adding a new characteristic, and linearizing, a cycle which is repeated a limited number of times. The lunar-surface parameters analyzed are the 620-nm albedo, the 620/380-nm color index, and the 950/620-nm index. The results are presented graphically; the reliability of the cluster-isolation process is discussed; and some correspondences between known lunar morphology and the cluster maps are indicated.

  9. Extension of modified power method to two-dimensional problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Peng; Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919; Lee, Hyunsuk

    2016-09-01

    In this study, the generalized modified power method was extended to two-dimensional problems. A direct application of the method to two-dimensional problems was shown to be unstable when the number of requested eigenmodes is larger than a certain problem dependent number. The root cause of this instability has been identified as the degeneracy of the transfer matrix. In order to resolve this instability, the number of sub-regions for the transfer matrix was increased to be larger than the number of requested eigenmodes; and a new transfer matrix was introduced accordingly which can be calculated by the least square method. Themore » stability of the new method has been successfully demonstrated with a neutron diffusion eigenvalue problem and the 2D C5G7 benchmark problem. - Graphical abstract:.« less

  10. Teaching the Falling Ball Problem with Dimensional Analysis

    ERIC Educational Resources Information Center

    Sznitman, Josué; Stone, Howard A.; Smits, Alexander J.; Grotberg, James B.

    2013-01-01

    Dimensional analysis is often a subject reserved for students of fluid mechanics. However, the principles of scaling and dimensional analysis are applicable to various physical problems, many of which can be introduced early on in a university physics curriculum. Here, we revisit one of the best-known examples from a first course in classic…

  11. Deep linear autoencoder and patch clustering-based unified one-dimensional coding of image and video

    NASA Astrophysics Data System (ADS)

    Li, Honggui

    2017-09-01

    This paper proposes a unified one-dimensional (1-D) coding framework of image and video, which depends on deep learning neural network and image patch clustering. First, an improved K-means clustering algorithm for image patches is employed to obtain the compact inputs of deep artificial neural network. Second, for the purpose of best reconstructing original image patches, deep linear autoencoder (DLA), a linear version of the classical deep nonlinear autoencoder, is introduced to achieve the 1-D representation of image blocks. Under the circumstances of 1-D representation, DLA is capable of attaining zero reconstruction error, which is impossible for the classical nonlinear dimensionality reduction methods. Third, a unified 1-D coding infrastructure for image, intraframe, interframe, multiview video, three-dimensional (3-D) video, and multiview 3-D video is built by incorporating different categories of videos into the inputs of patch clustering algorithm. Finally, it is shown in the results of simulation experiments that the proposed methods can simultaneously gain higher compression ratio and peak signal-to-noise ratio than those of the state-of-the-art methods in the situation of low bitrate transmission.

  12. Using High-Dimensional Image Models to Perform Highly Undetectable Steganography

    NASA Astrophysics Data System (ADS)

    Pevný, Tomáš; Filler, Tomáš; Bas, Patrick

    This paper presents a complete methodology for designing practical and highly-undetectable stegosystems for real digital media. The main design principle is to minimize a suitably-defined distortion by means of efficient coding algorithm. The distortion is defined as a weighted difference of extended state-of-the-art feature vectors already used in steganalysis. This allows us to "preserve" the model used by steganalyst and thus be undetectable even for large payloads. This framework can be efficiently implemented even when the dimensionality of the feature set used by the embedder is larger than 107. The high dimensional model is necessary to avoid known security weaknesses. Although high-dimensional models might be problem in steganalysis, we explain, why they are acceptable in steganography. As an example, we introduce HUGO, a new embedding algorithm for spatial-domain digital images and we contrast its performance with LSB matching. On the BOWS2 image database and in contrast with LSB matching, HUGO allows the embedder to hide 7× longer message with the same level of security level.

  13. Clustering methods for the optimization of atomic cluster structure

    NASA Astrophysics Data System (ADS)

    Bagattini, Francesco; Schoen, Fabio; Tigli, Luca

    2018-04-01

    In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.

  14. Entropy-based consensus clustering for patient stratification.

    PubMed

    Liu, Hongfu; Zhao, Rui; Fang, Hongsheng; Cheng, Feixiong; Fu, Yun; Liu, Yang-Yu

    2017-09-01

    Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient stratification. Many clustering methods have been employed to tackle this problem in a purely data-driven manner. Yet, existing methods leveraging high-throughput molecular data often suffers from various limitations, e.g. noise, data heterogeneity, high dimensionality or poor interpretability. Here we introduced an Entropy-based Consensus Clustering (ECC) method that overcomes those limitations all together. Our ECC method employs an entropy-based utility function to fuse many basic partitions to a consensus one that agrees with the basic ones as much as possible. Maximizing the utility function in ECC has a much more meaningful interpretation than any other consensus clustering methods. Moreover, we exactly map the complex utility maximization problem to the classic K -means clustering problem, which can then be efficiently solved with linear time and space complexity. Our ECC method can also naturally integrate multiple molecular data types measured from the same set of subjects, and easily handle missing values without any imputation. We applied ECC to 110 synthetic and 48 real datasets, including 35 cancer gene expression benchmark datasets and 13 cancer types with four molecular data types from The Cancer Genome Atlas. We found that ECC shows superior performance against existing clustering methods. Our results clearly demonstrate the power of ECC in clinically relevant patient stratification. The Matlab package is available at http://scholar.harvard.edu/yyl/ecc . yunfu@ece.neu.edu or yyl@channing.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Cluster analysis based on dimensional information with applications to feature selection and classification

    NASA Technical Reports Server (NTRS)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  16. Theory of the vortex-clustering transition in a confined two-dimensional quantum fluid

    NASA Astrophysics Data System (ADS)

    Yu, Xiaoquan; Billam, Thomas P.; Nian, Jun; Reeves, Matthew T.; Bradley, Ashton S.

    2016-08-01

    Clustering of like-sign vortices in a planar bounded domain is known to occur at negative temperature, a phenomenon that Onsager demonstrated to be a consequence of bounded phase space. In a confined superfluid, quantized vortices can support such an ordered phase, provided they evolve as an almost isolated subsystem containing sufficient energy. A detailed theoretical understanding of the statistical mechanics of such states thus requires a microcanonical approach. Here we develop an analytical theory of the vortex clustering transition in a neutral system of quantum vortices confined to a two-dimensional disk geometry, within the microcanonical ensemble. The choice of ensemble is essential for identifying the correct thermodynamic limit of the system, enabling a rigorous description of clustering in the language of critical phenomena. As the system energy increases above a critical value, the system develops global order via the emergence of a macroscopic dipole structure from the homogeneous phase of vortices, spontaneously breaking the Z2 symmetry associated with invariance under vortex circulation exchange, and the rotational SO (2 ) symmetry due to the disk geometry. The dipole structure emerges characterized by the continuous growth of the macroscopic dipole moment which serves as a global order parameter, resembling a continuous phase transition. The critical temperature of the transition, and the critical exponent associated with the dipole moment, are obtained exactly within mean-field theory. The clustering transition is shown to be distinct from the final state reached at high energy, known as supercondensation. The dipole moment develops via two macroscopic vortex clusters and the cluster locations are found analytically, both near the clustering transition and in the supercondensation limit. The microcanonical theory shows excellent agreement with Monte Carlo simulations, and signatures of the transition are apparent even for a modest system of 100

  17. Sparsity enabled cluster reduced-order models for control

    NASA Astrophysics Data System (ADS)

    Kaiser, Eurika; Morzyński, Marek; Daviller, Guillaume; Kutz, J. Nathan; Brunton, Bingni W.; Brunton, Steven L.

    2018-01-01

    Characterizing and controlling nonlinear, multi-scale phenomena are central goals in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. A key advantage of CROM is that it embeds nonlinear dynamics in a linear framework, which enables the application of standard linear techniques to the nonlinear system. CROM is typically computed on high-dimensional data; however, access to and computations on this full-state data limit the online implementation of CROM for prediction and control. Here, we address this key challenge by identifying a small subset of critical measurements to learn an efficient CROM, referred to as sparsity-enabled CROM. In particular, we leverage compressive measurements to faithfully embed the cluster geometry and preserve the probabilistic dynamics. Further, we show how to identify fewer optimized sensor locations tailored to a specific problem that outperform random measurements. Both of these sparsity-enabled sensing strategies significantly reduce the burden of data acquisition and processing for low-latency in-time estimation and control. We illustrate this unsupervised learning approach on three different high-dimensional nonlinear dynamical systems from fluids with increasing complexity, with one application in flow control. Sparsity-enabled CROM is a critical facilitator for real-time implementation on high-dimensional systems where full-state information may be inaccessible.

  18. Geometric MCMC for infinite-dimensional inverse problems

    NASA Astrophysics Data System (ADS)

    Beskos, Alexandros; Girolami, Mark; Lan, Shiwei; Farrell, Patrick E.; Stuart, Andrew M.

    2017-04-01

    Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank-Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.

  19. Variables separation and superintegrability of the nine-dimensional MICZ-Kepler problem

    NASA Astrophysics Data System (ADS)

    Phan, Ngoc-Hung; Le, Dai-Nam; Thoi, Tuan-Quoc N.; Le, Van-Hoang

    2018-03-01

    The nine-dimensional MICZ-Kepler problem is of recent interest. This is a system describing a charged particle moving in the Coulomb field plus the field of a SO(8) monopole in a nine-dimensional space. Interestingly, this problem is equivalent to a 16-dimensional harmonic oscillator via the Hurwitz transformation. In the present paper, we report on the multiseparability, a common property of superintegrable systems, and the superintegrability of the problem. First, we show the solvability of the Schrödinger equation of the problem by the variables separation method in different coordinates. Second, based on the SO(10) symmetry algebra of the system, we construct explicitly a set of seventeen invariant operators, which are all in the second order of the momentum components, satisfying the condition of superintegrability. The found number 17 coincides with the prediction of (2n - 1) law of maximal superintegrability order in the case n = 9. Until now, this law is accepted to apply only to scalar Hamiltonian eigenvalue equations in n-dimensional space; therefore, our results can be treated as evidence that this definition of superintegrability may also apply to some vector equations such as the Schrödinger equation for the nine-dimensional MICZ-Kepler problem.

  20. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical

  1. Vortex clustering and universal scaling laws in two-dimensional quantum turbulence.

    PubMed

    Skaugen, Audun; Angheluta, Luiza

    2016-03-01

    We investigate numerically the statistics of quantized vortices in two-dimensional quantum turbulence using the Gross-Pitaevskii equation. We find that a universal -5/3 scaling law in the turbulent energy spectrum is intimately connected with the vortex statistics, such as number fluctuations and vortex velocity, which is also characterized by a similar scaling behavior. The -5/3 scaling law appearing in the power spectrum of vortex number fluctuations is consistent with the scenario of passive advection of isolated vortices by a turbulent superfluid velocity generated by like-signed vortex clusters. The velocity probability distribution of clustered vortices is also sensitive to spatial configurations, and exhibits a power-law tail distribution with a -5/3 exponent.

  2. Exploratory Item Classification Via Spectral Graph Clustering

    PubMed Central

    Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2017-01-01

    Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476

  3. Repair of clustered DNA damage caused by high LET radiation in human fibroblasts

    NASA Technical Reports Server (NTRS)

    Rydberg, B.; Lobrich, M.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)

    1998-01-01

    It has recently been demonstrated experimentally that DNA damage induced by high LET radiation in mammalian cells is non-randomly distributed along the DNA molecule in the form of clusters of various sizes. The sizes of such clusters range from a few base-pairs to at least 200 kilobase-pairs. The high biological efficiency of high LET radiation for induction of relevant biological endpoints is probably a consequence of this clustering, although the exact mechanisms by which the clustering affects the biological outcome is not known. We discuss here results for induction and repair of base damage, single-strand breaks and double-strand breaks for low and high LET radiations. These results are discussed in the context of clustering. Of particular interest is to determine how clustering at different scales affects overall rejoining and fidelity of rejoining of DNA double-strand breaks. However, existing methods for measuring repair of DNA strand breaks are unable to resolve breaks that are close together in a cluster. This causes problems in interpretation of current results from high LET radiation and will require new methods to be developed.

  4. Diffusion maps for high-dimensional single-cell analysis of differentiation data.

    PubMed

    Haghverdi, Laleh; Buettner, Florian; Theis, Fabian J

    2015-09-15

    Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages. Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells. The Matlab implementation of diffusion maps for single-cell data is available at https

  5. Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences

    NASA Technical Reports Server (NTRS)

    Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene

    2006-01-01

    This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.

  6. High-resolution Self-Organizing Maps for advanced visualization and dimension reduction.

    PubMed

    Saraswati, Ayu; Nguyen, Van Tuc; Hagenbuchner, Markus; Tsoi, Ah Chung

    2018-05-04

    Kohonen's Self Organizing feature Map (SOM) provides an effective way to project high dimensional input features onto a low dimensional display space while preserving the topological relationships among the input features. Recent advances in algorithms that take advantages of modern computing hardware introduced the concept of high resolution SOMs (HRSOMs). This paper investigates the capabilities and applicability of the HRSOM as a visualization tool for cluster analysis and its suitabilities to serve as a pre-processor in ensemble learning models. The evaluation is conducted on a number of established benchmarks and real-world learning problems, namely, the policeman benchmark, two web spam detection problems, a network intrusion detection problem, and a malware detection problem. It is found that the visualization resulted from an HRSOM provides new insights concerning these learning problems. It is furthermore shown empirically that broad benefits from the use of HRSOMs in both clustering and classification problems can be expected. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Clustervision: Visual Supervision of Unsupervised Clustering.

    PubMed

    Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

    2018-01-01

    Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.

  8. Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation

    NASA Astrophysics Data System (ADS)

    Tripathy, Rohit; Bilionis, Ilias; Gonzalez, Marcial

    2016-09-01

    Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range of physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the

  9. Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tripathy, Rohit, E-mail: rtripath@purdue.edu; Bilionis, Ilias, E-mail: ibilion@purdue.edu; Gonzalez, Marcial, E-mail: marcial-gonzalez@purdue.edu

    2016-09-15

    Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range ofmore » physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To

  10. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  11. Local polynomial chaos expansion for linear differential equations with high dimensional random inputs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Yi; Jakeman, John; Gittelson, Claude

    2015-01-08

    In this paper we present a localized polynomial chaos expansion for partial differential equations (PDE) with random inputs. In particular, we focus on time independent linear stochastic problems with high dimensional random inputs, where the traditional polynomial chaos methods, and most of the existing methods, incur prohibitively high simulation cost. Furthermore, the local polynomial chaos method employs a domain decomposition technique to approximate the stochastic solution locally. In each subdomain, a subdomain problem is solved independently and, more importantly, in a much lower dimensional random space. In a postprocesing stage, accurate samples of the original stochastic problems are obtained frommore » the samples of the local solutions by enforcing the correct stochastic structure of the random inputs and the coupling conditions at the interfaces of the subdomains. Overall, the method is able to solve stochastic PDEs in very large dimensions by solving a collection of low dimensional local problems and can be highly efficient. In our paper we present the general mathematical framework of the methodology and use numerical examples to demonstrate the properties of the method.« less

  12. Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering.

    PubMed

    Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A

    2015-09-01

    To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.

  13. High-resolution three-dimensional imaging radar

    NASA Technical Reports Server (NTRS)

    Cooper, Ken B. (Inventor); Chattopadhyay, Goutam (Inventor); Siegel, Peter H. (Inventor); Dengler, Robert J. (Inventor); Schlecht, Erich T. (Inventor); Mehdi, Imran (Inventor); Skalare, Anders J. (Inventor)

    2010-01-01

    A three-dimensional imaging radar operating at high frequency e.g., 670 GHz, is disclosed. The active target illumination inherent in radar solves the problem of low signal power and narrow-band detection by using submillimeter heterodyne mixer receivers. A submillimeter imaging radar may use low phase-noise synthesizers and a fast chirper to generate a frequency-modulated continuous-wave (FMCW) waveform. Three-dimensional images are generated through range information derived for each pixel scanned over a target. A peak finding algorithm may be used in processing for each pixel to differentiate material layers of the target. Improved focusing is achieved through a compensation signal sampled from a point source calibration target and applied to received signals from active targets prior to FFT-based range compression to extract and display high-resolution target images. Such an imaging radar has particular application in detecting concealed weapons or contraband.

  14. Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

    NASA Astrophysics Data System (ADS)

    Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

    In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.

  15. Arbitrarily high-order time-stepping schemes based on the operator spectrum theory for high-dimensional nonlinear Klein-Gordon equations

    NASA Astrophysics Data System (ADS)

    Liu, Changying; Wu, Xinyuan

    2017-07-01

    In this paper we explore arbitrarily high-order Lagrange collocation-type time-stepping schemes for effectively solving high-dimensional nonlinear Klein-Gordon equations with different boundary conditions. We begin with one-dimensional periodic boundary problems and first formulate an abstract ordinary differential equation (ODE) on a suitable infinity-dimensional function space based on the operator spectrum theory. We then introduce an operator-variation-of-constants formula which is essential for the derivation of our arbitrarily high-order Lagrange collocation-type time-stepping schemes for the nonlinear abstract ODE. The nonlinear stability and convergence are rigorously analysed once the spatial differential operator is approximated by an appropriate positive semi-definite matrix under some suitable smoothness assumptions. With regard to the two dimensional Dirichlet or Neumann boundary problems, our new time-stepping schemes coupled with discrete Fast Sine / Cosine Transformation can be applied to simulate the two-dimensional nonlinear Klein-Gordon equations effectively. All essential features of the methodology are present in one-dimensional and two-dimensional cases, although the schemes to be analysed lend themselves with equal to higher-dimensional case. The numerical simulation is implemented and the numerical results clearly demonstrate the advantage and effectiveness of our new schemes in comparison with the existing numerical methods for solving nonlinear Klein-Gordon equations in the literature.

  16. Interacting star clusters in the Large Magellanic Cloud. Overmerging problem solved by cluster group formation

    NASA Astrophysics Data System (ADS)

    Leon, Stéphane; Bergond, Gilles; Vallenari, Antonella

    1999-04-01

    We present the tidal tail distributions of a sample of candidate binary clusters located in the bar of the Large Magellanic Cloud (LMC). One isolated cluster, SL 268, is presented in order to study the effect of the LMC tidal field. All the candidate binary clusters show tidal tails, confirming that the pairs are formed by physically linked objects. The stellar mass in the tails covers a large range, from 1.8x 10(3) to 3x 10(4) \\msun. We derive a total mass estimate for SL 268 and SL 356. At large radii, the projected density profiles of SL 268 and SL 356 fall off as r(-gamma ) , with gamma = 2.27 and gamma =3.44, respectively. Out of 4 pairs or multiple systems, 2 are older than the theoretical survival time of binary clusters (going from a few 10(6) years to 10(8) years). A pair shows too large age difference between the components to be consistent with classical theoretical models of binary cluster formation (Fujimoto & Kumai \\cite{fujimoto97}). We refer to this as the ``overmerging'' problem. A different scenario is proposed: the formation proceeds in large molecular complexes giving birth to groups of clusters over a few 10(7) years. In these groups the expected cluster encounter rate is larger, and tidal capture has higher probability. Cluster pairs are not born together through the splitting of the parent cloud, but formed later by tidal capture. For 3 pairs, we tentatively identify the star cluster group (SCG) memberships. The SCG formation, through the recent cluster starburst triggered by the LMC-SMC encounter, in contrast with the quiescent open cluster formation in the Milky Way can be an explanation to the paucity of binary clusters observed in our Galaxy. Based on observations collected at the European Southern Observatory, La Silla, Chile}

  17. External Boundary Conditions for Three-Dimensional Problems of Computational Aerodynamics

    NASA Technical Reports Server (NTRS)

    Tsynkov, Semyon V.

    1997-01-01

    We consider an unbounded steady-state flow of viscous fluid over a three-dimensional finite body or configuration of bodies. For the purpose of solving this flow problem numerically, we discretize the governing equations (Navier-Stokes) on a finite-difference grid. The grid obviously cannot stretch from the body up to infinity, because the number of the discrete variables in that case would not be finite. Therefore, prior to the discretization we truncate the original unbounded flow domain by introducing some artificial computational boundary at a finite distance of the body. Typically, the artificial boundary is introduced in a natural way as the external boundary of the domain covered by the grid. The flow problem formulated only on the finite computational domain rather than on the original infinite domain is clearly subdefinite unless some artificial boundary conditions (ABC's) are specified at the external computational boundary. Similarly, the discretized flow problem is subdefinite (i.e., lacks equations with respect to unknowns) unless a special closing procedure is implemented at this artificial boundary. The closing procedure in the discrete case is called the ABC's as well. In this paper, we present an innovative approach to constructing highly accurate ABC's for three-dimensional flow computations. The approach extends our previous technique developed for the two-dimensional case; it employs the finite-difference counterparts to Calderon's pseudodifferential boundary projections calculated in the framework of the difference potentials method (DPM) by Ryaben'kii. The resulting ABC's appear spatially nonlocal but particularly easy to implement along with the existing solvers. The new boundary conditions have been successfully combined with the NASA-developed production code TLNS3D and used for the analysis of wing-shaped configurations in subsonic (including incompressible limit) and transonic flow regimes. As demonstrated by the computational experiments

  18. Charge carrier localised in zero-dimensional (CH3NH3)3Bi2I9 clusters.

    PubMed

    Ni, Chengsheng; Hedley, Gordon; Payne, Julia; Svrcek, Vladimir; McDonald, Calum; Jagadamma, Lethy Krishnan; Edwards, Paul; Martin, Robert; Jain, Gunisha; Carolan, Darragh; Mariotti, Davide; Maguire, Paul; Samuel, Ifor; Irvine, John

    2017-08-01

    A metal-organic hybrid perovskite (CH 3 NH 3 PbI 3 ) with three-dimensional framework of metal-halide octahedra has been reported as a low-cost, solution-processable absorber for a thin-film solar cell with a power-conversion efficiency over 20%. Low-dimensional layered perovskites with metal halide slabs separated by the insulating organic layers are reported to show higher stability, but the efficiencies of the solar cells are limited by the confinement of excitons. In order to explore the confinement and transport of excitons in zero-dimensional metal-organic hybrid materials, a highly orientated film of (CH 3 NH 3 ) 3 Bi 2 I 9 with nanometre-sized core clusters of Bi 2 I 9 3- surrounded by insulating CH 3 NH 3 + was prepared via solution processing. The (CH 3 NH 3 ) 3 Bi 2 I 9 film shows highly anisotropic photoluminescence emission and excitation due to the large proportion of localised excitons coupled with delocalised excitons from intercluster energy transfer. The abrupt increase in photoluminescence quantum yield at excitation energy above twice band gap could indicate a quantum cutting due to the low dimensionality.Understanding the confinement and transport of excitons in low dimensional systems will aid the development of next generation photovoltaics. Via photophysical studies Ni et al. observe 'quantum cutting' in 0D metal-organic hybrid materials based on methylammonium bismuth halide (CH 3 NH 3 )3Bi 2 I 9 .

  19. High Dimensional Classification Using Features Annealed Independence Rules.

    PubMed

    Fan, Jianqing; Fan, Yingying

    2008-01-01

    Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

  20. The dimension split element-free Galerkin method for three-dimensional potential problems

    NASA Astrophysics Data System (ADS)

    Meng, Z. J.; Cheng, H.; Ma, L. D.; Cheng, Y. M.

    2018-06-01

    This paper presents the dimension split element-free Galerkin (DSEFG) method for three-dimensional potential problems, and the corresponding formulae are obtained. The main idea of the DSEFG method is that a three-dimensional potential problem can be transformed into a series of two-dimensional problems. For these two-dimensional problems, the improved moving least-squares (IMLS) approximation is applied to construct the shape function, which uses an orthogonal function system with a weight function as the basis functions. The Galerkin weak form is applied to obtain a discretized system equation, and the penalty method is employed to impose the essential boundary condition. The finite difference method is selected in the splitting direction. For the purposes of demonstration, some selected numerical examples are solved using the DSEFG method. The convergence study and error analysis of the DSEFG method are presented. The numerical examples show that the DSEFG method has greater computational precision and computational efficiency than the IEFG method.

  1. Exact solution of three-dimensional transport problems using one-dimensional models. [in semiconductor devices

    NASA Technical Reports Server (NTRS)

    Misiakos, K.; Lindholm, F. A.

    1986-01-01

    Several parameters of certain three-dimensional semiconductor devices including diodes, transistors, and solar cells can be determined without solving the actual boundary-value problem. The recombination current, transit time, and open-circuit voltage of planar diodes are emphasized here. The resulting analytical expressions enable determination of the surface recombination velocity of shallow planar diodes. The method involves introducing corresponding one-dimensional models having the same values of these parameters.

  2. Finite-dimensional integrable systems: A collection of research problems

    NASA Astrophysics Data System (ADS)

    Bolsinov, A. V.; Izosimov, A. M.; Tsonev, D. M.

    2017-05-01

    This article suggests a series of problems related to various algebraic and geometric aspects of integrability. They reflect some recent developments in the theory of finite-dimensional integrable systems such as bi-Poisson linear algebra, Jordan-Kronecker invariants of finite dimensional Lie algebras, the interplay between singularities of Lagrangian fibrations and compatible Poisson brackets, and new techniques in projective geometry.

  3. Robust continuous clustering

    PubMed Central

    Shah, Sohil Atul

    2017-01-01

    Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838

  4. Asymptotics of empirical eigenstructure for high dimensional spiked covariance.

    PubMed

    Wang, Weichen; Fan, Jianqing

    2017-06-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.

  5. Asymptotics of empirical eigenstructure for high dimensional spiked covariance

    PubMed Central

    Wang, Weichen

    2017-01-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies. PMID:28835726

  6. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    NASA Technical Reports Server (NTRS)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  7. Three-dimensional discrete-time Lotka-Volterra models with an application to industrial clusters

    NASA Astrophysics Data System (ADS)

    Bischi, G. I.; Tramontana, F.

    2010-10-01

    We consider a three-dimensional discrete dynamical system that describes an application to economics of a generalization of the Lotka-Volterra prey-predator model. The dynamic model proposed is used to describe the interactions among industrial clusters (or districts), following a suggestion given by [23]. After studying some local and global properties and bifurcations in bidimensional Lotka-Volterra maps, by numerical explorations we show how some of them can be extended to their three-dimensional counterparts, even if their analytic and geometric characterization becomes much more difficult and challenging. We also show a global bifurcation of the three-dimensional system that has no two-dimensional analogue. Besides the particular economic application considered, the study of the discrete version of Lotka-Volterra dynamical systems turns out to be a quite rich and interesting topic by itself, i.e. from a purely mathematical point of view.

  8. An Adaptive ANOVA-based PCKF for High-Dimensional Nonlinear Inverse Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    LI, Weixuan; Lin, Guang; Zhang, Dongxiao

    2014-02-01

    The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos bases in the expansion helps to capture uncertainty more accurately but increases computational cost. Bases selection is particularly importantmore » for high-dimensional stochastic problems because the number of polynomial chaos bases required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE bases are pre-set based on users’ experience. Also, for sequential data assimilation problems, the bases kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE bases for different problems and automatically adjusts the number of bases in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm is tested with different examples and demonstrated great effectiveness in comparison with non-adaptive PCKF and En

  9. An adaptive ANOVA-based PCKF for high-dimensional nonlinear inverse modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Weixuan, E-mail: weixuan.li@usc.edu; Lin, Guang, E-mail: guang.lin@pnnl.gov; Zhang, Dongxiao, E-mail: dxz@pku.edu.cn

    2014-02-01

    The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos basis functions in the expansion helps to capture uncertainty more accurately but increases computational cost. Selection of basis functionsmore » is particularly important for high-dimensional stochastic problems because the number of polynomial chaos basis functions required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE basis functions are pre-set based on users' experience. Also, for sequential data assimilation problems, the basis functions kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE basis functions for different problems and automatically adjusts the number of basis functions in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm was tested with different examples and

  10. Correlation buildup during recrystallization in three-dimensional dusty plasma clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schella, André; Mulsow, Matthias; Melzer, André

    2014-05-15

    The recrystallization process of finite three-dimensional dust clouds after laser heating is studied experimentally. The time-dependent Coulomb coupling parameter is presented, showing that the recrystallization starts with an exponential cooling phase where cooling is slower than damping by the neutral gas friction. At later times, the coupling parameter oscillates into equilibrium. It is found that a large fraction of cluster states after recrystallization experiments is in metastable states. The temporal evolution of the correlation buildup shows that correlation occurs on even slower time scale than cooling.

  11. Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

    1999-10-14

    Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less

  12. Surrogate modelling for the prediction of spatial fields based on simultaneous dimensionality reduction of high-dimensional input/output spaces.

    PubMed

    Crevillén-García, D

    2018-04-01

    Time-consuming numerical simulators for solving groundwater flow and dissolution models of physico-chemical processes in deep aquifers normally require some of the model inputs to be defined in high-dimensional spaces in order to return realistic results. Sometimes, the outputs of interest are spatial fields leading to high-dimensional output spaces. Although Gaussian process emulation has been satisfactorily used for computing faithful and inexpensive approximations of complex simulators, these have been mostly applied to problems defined in low-dimensional input spaces. In this paper, we propose a method for simultaneously reducing the dimensionality of very high-dimensional input and output spaces in Gaussian process emulators for stochastic partial differential equation models while retaining the qualitative features of the original models. This allows us to build a surrogate model for the prediction of spatial fields in such time-consuming simulators. We apply the methodology to a model of convection and dissolution processes occurring during carbon capture and storage.

  13. Visual exploration of high-dimensional data through subspace analysis and dynamic projections

    DOE PAGES

    Liu, S.; Wang, B.; Thiagarajan, J. J.; ...

    2015-06-01

    Here, we introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that createmore » smooth animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less

  14. Visual Exploration of High-Dimensional Data through Subspace Analysis and Dynamic Projections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, S.; Wang, B.; Thiagarajan, Jayaraman J.

    2015-06-01

    We introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that create smoothmore » animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less

  15. High dimensional biological data retrieval optimization with NoSQL technology.

    PubMed

    Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike

    2014-01-01

    High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating

  16. High dimensional biological data retrieval optimization with NoSQL technology

    PubMed Central

    2014-01-01

    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data

  17. Children's Strategies for Solving Two- and Three-Dimensional Combinatorial Problems.

    ERIC Educational Resources Information Center

    English, Lyn D.

    1993-01-01

    Investigated strategies that 7- to 12-year-old children (n=96) spontaneously applied in solving novel combinatorial problems. With experience in solving two-dimensional problems, children were able to refine their strategies and adapt them to three dimensions. Results on some problems indicated significant effects of age. (Contains 32 references.)…

  18. Detection of Subtle Context-Dependent Model Inaccuracies in High-Dimensional Robot Domains.

    PubMed

    Mendoza, Juan Pablo; Simmons, Reid; Veloso, Manuela

    2016-12-01

    Autonomous robots often rely on models of their sensing and actions for intelligent decision making. However, when operating in unconstrained environments, the complexity of the world makes it infeasible to create models that are accurate in every situation. This article addresses the problem of using potentially large and high-dimensional sets of robot execution data to detect situations in which a robot model is inaccurate-that is, detecting context-dependent model inaccuracies in a high-dimensional context space. To find inaccuracies tractably, the robot conducts an informed search through low-dimensional projections of execution data to find parametric Regions of Inaccurate Modeling (RIMs). Empirical evidence from two robot domains shows that this approach significantly enhances the detection power of existing RIM-detection algorithms in high-dimensional spaces.

  19. The Heterogeneous P-Median Problem for Categorization Based Clustering

    ERIC Educational Resources Information Center

    Blanchard, Simon J.; Aloise, Daniel; DeSarbo, Wayne S.

    2012-01-01

    The p-median offers an alternative to centroid-based clustering algorithms for identifying unobserved categories. However, existing p-median formulations typically require data aggregation into a single proximity matrix, resulting in masked respondent heterogeneity. A proposed three-way formulation of the p-median problem explicitly considers…

  20. Cooling and clusters: when is heating needed?

    PubMed

    Bryan, Greg; Voit, Mark

    2005-03-15

    There are (at least) two unsolved problems concerning the current state of the ther- mal gas in clusters of galaxies. The first is to identify the source of the heating which onsets cooling in the centres of clusters with short cooling times (the 'cooling-flow' problem). The second to understand the mechanism which boosts the entropy in cluster and group gas. Since both of these problems involve an unknown source of heating it is tempting to identify them with the same process, particularly since active galactic nuclei heating is observed to be operating at some level in a sample of well-observed 'cooling-flow' clusters. Here we show, using numerical simulations of cluster formation, that much of the gas ending up in clusters cools at high redshift and so the heating is also needed at high redshift, well before the cluster forms. This indicates that the same process operating to solve the cooling-flow problem may not also resolve the cluster-entropy problem.

  1. Beyond Low-Rank Representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering.

    PubMed

    Wang, Yang; Wu, Lin

    2018-07-01

    Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the following: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority, especially over recent state-of-the-art LRR models. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    PubMed Central

    Liu, Jingxian; Wu, Kefeng

    2017-01-01

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  3. Multigrid one shot methods for optimal control problems: Infinite dimensional control

    NASA Technical Reports Server (NTRS)

    Arian, Eyal; Taasan, Shlomo

    1994-01-01

    The multigrid one shot method for optimal control problems, governed by elliptic systems, is introduced for the infinite dimensional control space. ln this case, the control variable is a function whose discrete representation involves_an increasing number of variables with grid refinement. The minimization algorithm uses Lagrange multipliers to calculate sensitivity gradients. A preconditioned gradient descent algorithm is accelerated by a set of coarse grids. It optimizes for different scales in the representation of the control variable on different discretization levels. An analysis which reduces the problem to the boundary is introduced. It is used to approximate the two level asymptotic convergence rate, to determine the amplitude of the minimization steps, and the choice of a high pass filter to be used when necessary. The effectiveness of the method is demonstrated on a series of test problems. The new method enables the solutions of optimal control problems at the same cost of solving the corresponding analysis problems just a few times.

  4. CLUMP-3D: Three-dimensional Shape and Structure of 20 CLASH Galaxy Clusters from Combined Weak and Strong Lensing

    NASA Astrophysics Data System (ADS)

    Chiu, I.-Non; Umetsu, Keiichi; Sereno, Mauro; Ettori, Stefano; Meneghetti, Massimo; Merten, Julian; Sayers, Jack; Zitrin, Adi

    2018-06-01

    We perform a three-dimensional triaxial analysis of 16 X-ray regular and 4 high-magnification galaxy clusters selected from the CLASH survey by combining two-dimensional weak-lensing and central strong-lensing constraints. In a Bayesian framework, we constrain the intrinsic structure and geometry of each individual cluster assuming a triaxial Navarro–Frenk–White halo with arbitrary orientations, characterized by the mass {M}200{{c}}, halo concentration {c}200{{c}}, and triaxial axis ratios ({q}{{a}}≤slant {q}{{b}}), and investigate scaling relations between these halo structural parameters. From triaxial modeling of the X-ray-selected subsample, we find that the halo concentration decreases with increasing cluster mass, with a mean concentration of {c}200{{c}}=4.82+/- 0.30 at the pivot mass {M}200{{c}}={10}15{M}ȯ {h}-1. This is consistent with the result from spherical modeling, {c}200{{c}}=4.51+/- 0.14. Independently of the priors, the minor-to-major axis ratio {q}{{a}} of our full sample exhibits a clear deviation from the spherical configuration ({q}{{a}}=0.52+/- 0.04 at {10}15{M}ȯ {h}-1 with uniform priors), with a weak dependence on the cluster mass. Combining all 20 clusters, we obtain a joint ensemble constraint on the minor-to-major axis ratio of {q}{{a}}={0.652}-0.078+0.162 and a lower bound on the intermediate-to-major axis ratio of {q}{{b}}> 0.63 at the 2σ level from an analysis with uniform priors. Assuming priors on the axis ratios derived from numerical simulations, we constrain the degree of triaxiality for the full sample to be { \\mathcal T }=0.79+/- 0.03 at {10}15{M}ȯ {h}-1, indicating a preference for a prolate geometry of cluster halos. We find no statistical evidence for an orientation bias ({f}geo}=0.93+/- 0.07), which is insensitive to the priors and in agreement with the theoretical expectation for the CLASH clusters.

  5. Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data

    NASA Astrophysics Data System (ADS)

    Palumbo, Francesco; D'Enza, Alfonso Iodice

    The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.

  6. Three-dimensional finite element analysis for high velocity impact. [of projectiles from space debris

    NASA Technical Reports Server (NTRS)

    Chan, S. T. K.; Lee, C. H.; Brashears, M. R.

    1975-01-01

    A finite element algorithm for solving unsteady, three-dimensional high velocity impact problems is presented. A computer program was developed based on the Eulerian hydroelasto-viscoplastic formulation and the utilization of the theorem of weak solutions. The equations solved consist of conservation of mass, momentum, and energy, equation of state, and appropriate constitutive equations. The solution technique is a time-dependent finite element analysis utilizing three-dimensional isoparametric elements, in conjunction with a generalized two-step time integration scheme. The developed code was demonstrated by solving one-dimensional as well as three-dimensional impact problems for both the inviscid hydrodynamic model and the hydroelasto-viscoplastic model.

  7. Three-dimensional cluster formation and structure in heterogeneous dose distribution of intensity modulated radiation therapy.

    PubMed

    Chao, Ming; Wei, Jie; Narayanasamy, Ganesh; Yuan, Yading; Lo, Yeh-Chi; Peñagarícano, José A

    2018-05-01

    To investigate three-dimensional cluster structure and its correlation to clinical endpoint in heterogeneous dose distributions from intensity modulated radiation therapy. Twenty-five clinical plans from twenty-one head and neck (HN) patients were used for a phenomenological study of the cluster structure formed from the dose distributions of organs at risks (OARs) close to the planning target volumes (PTVs). Initially, OAR clusters were searched to examine the pattern consistence among ten HN patients and five clinically similar plans from another HN patient. Second, clusters of the esophagus from another ten HN patients were scrutinized to correlate their sizes to radiobiological parameters. Finally, an extensive Monte Carlo (MC) procedure was implemented to gain deeper insights into the behavioral properties of the cluster formation. Clinical studies showed that OAR clusters had drastic differences despite similar PTV coverage among different patients, and the radiobiological parameters failed to positively correlate with the cluster sizes. MC study demonstrated the inverse relationship between the cluster size and the cluster connectivity, and the nonlinear changes in cluster size with dose thresholds. In addition, the clusters were insensitive to the shape of OARs. The results demonstrated that the cluster size could serve as an insightful index of normal tissue damage. The clinical outcome of the same dose-volume might be potentially different. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Random Walk Method for Potential Problems

    NASA Technical Reports Server (NTRS)

    Krishnamurthy, T.; Raju, I. S.

    2002-01-01

    A local Random Walk Method (RWM) for potential problems governed by Lapalace's and Paragon's equations is developed for two- and three-dimensional problems. The RWM is implemented and demonstrated in a multiprocessor parallel environment on a Beowulf cluster of computers. A speed gain of 16 is achieved as the number of processors is increased from 1 to 23.

  9. Polynomial-Time Approximation Algorithm for the Problem of Cardinality-Weighted Variance-Based 2-Clustering with a Given Center

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Motkova, A. V.

    2018-01-01

    A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.

  10. Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate

    NASA Astrophysics Data System (ADS)

    Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration

    2000-12-01

    We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.

  11. State estimation and prediction using clustered particle filters.

    PubMed

    Lee, Yoonsang; Majda, Andrew J

    2016-12-20

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.

  12. State estimation and prediction using clustered particle filters

    PubMed Central

    Lee, Yoonsang; Majda, Andrew J.

    2016-01-01

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332

  13. TripAdvisor^{N-D}: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail.

    PubMed

    Nam, Julia EunJu; Mueller, Klaus

    2013-02-01

    Gaining a true appreciation of high-dimensional space remains difficult since all of the existing high-dimensional space exploration techniques serialize the space travel in some way. This is not so foreign to us since we, when traveling, also experience the world in a serial fashion. But we typically have access to a map to help with positioning, orientation, navigation, and trip planning. Here, we propose a multivariate data exploration tool that compares high-dimensional space navigation with a sightseeing trip. It decomposes this activity into five major tasks: 1) Identify the sights: use a map to identify the sights of interest and their location; 2) Plan the trip: connect the sights of interest along a specifyable path; 3) Go on the trip: travel along the route; 4) Hop off the bus: experience the location, look around, zoom into detail; and 5) Orient and localize: regain bearings in the map. We describe intuitive and interactive tools for all of these tasks, both global navigation within the map and local exploration of the data distributions. For the latter, we describe a polygonal touchpad interface which enables users to smoothly tilt the projection plane in high-dimensional space to produce multivariate scatterplots that best convey the data relationships under investigation. Motion parallax and illustrative motion trails aid in the perception of these transient patterns. We describe the use of our system within two applications: 1) the exploratory discovery of data configurations that best fit a personal preference in the presence of tradeoffs and 2) interactive cluster analysis via cluster sculpting in N-D.

  14. FPGA cluster for high-performance AO real-time control system

    NASA Astrophysics Data System (ADS)

    Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.

    2006-06-01

    Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.

  15. Impulsivity and negative emotionality associated with substance use problems and Cluster B personality in college students.

    PubMed

    James, Lisa M; Taylor, Jeanette

    2007-04-01

    The co-occurrence of personality disorders (PDs) and substance use disorders (SUDs) can be partially attributed to shared underlying personality traits. This study examined the role of negative emotionality (NEM) and impulsivity in 617 university students with self-reported substance use problems and Cluster B PD symptoms. Results indicated that NEM was significantly associated with drug and alcohol use problems, antisocial PD, borderline PD, and narcissistic PD. Impulsivity was significantly associated with drug use problems, antisocial PD, and histrionic PD. Only NEM mediated the relationship between alcohol use problems and symptoms of each of the Cluster B PDs while impulsivity mediated only the relationship between drug use problems and histrionic PD. These results suggest that NEM may be more relevant than impulsivity to our understanding of the co-occurrence between substance use problems and Cluster B PD features.

  16. Development of a Three-Dimensional PSE Code for Compressible Flows: Stability of Three-Dimensional Compressible Boundary Layers

    NASA Technical Reports Server (NTRS)

    Balakumar, P.; Jeyasingham, Samarasingham

    1999-01-01

    A program is developed to investigate the linear stability of three-dimensional compressible boundary layer flows over bodies of revolutions. The problem is formulated as a two dimensional (2D) eigenvalue problem incorporating the meanflow variations in the normal and azimuthal directions. Normal mode solutions are sought in the whole plane rather than in a line normal to the wall as is done in the classical one dimensional (1D) stability theory. The stability characteristics of a supersonic boundary layer over a sharp cone with 50 half-angle at 2 degrees angle of attack is investigated. The 1D eigenvalue computations showed that the most amplified disturbances occur around x(sub 2) = 90 degrees and the azimuthal mode number for the most amplified disturbances range between m = -30 to -40. The frequencies of the most amplified waves are smaller in the middle region where the crossflow dominates the instability than the most amplified frequencies near the windward and leeward planes. The 2D eigenvalue computations showed that due to the variations in the azimuthal direction, the eigenmodes are clustered into isolated confined regions. For some eigenvalues, the eigenfunctions are clustered in two regions. Due to the nonparallel effect in the azimuthal direction, the eigenmodes are clustered into isolated confined regions. For some eigenvalues, the eigenfunctions are clustered in two regions. Due to the nonparallel effect in the azimuthal direction, the most amplified disturbances are shifted to 120 degrees compared to 90 degrees for the parallel theory. It is also observed that the nonparallel amplification rates are smaller than that is obtained from the parallel theory.

  17. Three-dimensional study of grain boundary engineering effects on intergranular stress corrosion cracking of 316 stainless steel in high temperature water

    NASA Astrophysics Data System (ADS)

    Liu, Tingguang; Xia, Shuang; Bai, Qin; Zhou, Bangxin; Zhang, Lefu; Lu, Yonghao; Shoji, Tetsuo

    2018-01-01

    The intergranular cracks and grain boundary (GB) network of a GB-engineered 316 stainless steel after stress corrosion cracking (SCC) test in high temperature high pressure water of reactor environment were investigated by two-dimensional and three-dimensional (3D) characterization in order to expose the mechanism that GB-engineering mitigates intergranular SCC. The 3D microstructure shown that the essential characteristic of the GB-engineered microstructure is formation of many large twin-boundaries as a result of multiple-twinning, which results in the formation of large grain-clusters. The large grain-clusters played a key role to the improvement of intergranular SCC resistance by GB-engineering. The main intergranular cracks propagated in a zigzag along the outer boundaries of these large grain-clusters because all inner boundaries of the grain-clusters were twin-boundaries (∑3) or twin-related boundaries (∑3n) which had much lower susceptibility to SCC than random boundaries. These large grain-clusters had tree-ring-shaped topology structure and very complex morphology. They got tangled so that difficult to be separated during SCC, resulting in some large crack-bridges retained in the crack surface.

  18. On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

    ERIC Educational Resources Information Center

    Carter, Randy L.; And Others

    1989-01-01

    The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…

  19. Parallel solution of sparse one-dimensional dynamic programming problems

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1989-01-01

    Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.

  20. A reduced-order model from high-dimensional frictional hysteresis

    PubMed Central

    Biswas, Saurabh; Chatterjee, Anindya

    2014-01-01

    Hysteresis in material behaviour includes both signum nonlinearities as well as high dimensionality. Available models for component-level hysteretic behaviour are empirical. Here, we derive a low-order model for rate-independent hysteresis from a high-dimensional massless frictional system. The original system, being given in terms of signs of velocities, is first solved incrementally using a linear complementarity problem formulation. From this numerical solution, to develop a reduced-order model, basis vectors are chosen using the singular value decomposition. The slip direction in generalized coordinates is identified as the minimizer of a dissipation-related function. That function includes terms for frictional dissipation through signum nonlinearities at many friction sites. Luckily, it allows a convenient analytical approximation. Upon solution of the approximated minimization problem, the slip direction is found. A final evolution equation for a few states is then obtained that gives a good match with the full solution. The model obtained here may lead to new insights into hysteresis as well as better empirical modelling thereof. PMID:24910522

  1. The relationship between two-dimensional self-esteem and problem solving style in an anorexic inpatient sample.

    PubMed

    Paterson, Gillian; Power, Kevin; Yellowlees, Alex; Park, Katy; Taylor, Louise

    2007-01-01

    Research examining cognitive and behavioural determinants of anorexia is currently lacking. This has implications for the success of treatment programmes for anorexics, particularly, given the high reported dropout rates. This study examines two-dimensional self-esteem (comprising of self-competence and self-liking) and social problem-solving in an anorexic population and predicts that self-esteem will mediate the relationship between problem-solving and eating pathology by facilitating/inhibiting use of faulty/effective strategies. Twenty-seven anorexic inpatients and 62 controls completed measures of social problem solving and two-dimensional self-esteem. Anorexics scored significantly higher than the non-clinical group on measures of eating pathology, negative problem orientation, impulsivity/carelessness and avoidance and significantly lower on positive problem orientation and both self-esteem components. In the clinical sample, disordered eating correlated significantly with self-competence, negative problem-orientation and avoidance. Associations between disordered eating and problem solving lost significance when self-esteem was controlled in the clinical group only. Self-competence was found to be the main predictor of eating pathology in the clinical sample while self-liking, impulsivity and negative and positive problem orientation were main predictors in the non-clinical sample. Findings support the two-dimensional self-esteem theory with self-competence only being relevant to the anorexic population and support the hypothesis that self-esteem mediates the relationship between disordered eating and problem solving ability in an anorexic sample. Treatment implications include support for programmes emphasising increasing self-appraisal and self-efficacy. 2006 John Wiley & Sons, Ltd and Eating Disorders Association

  2. High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster

    NASA Astrophysics Data System (ADS)

    Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku

    2015-01-01

    High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.

  3. One dimensional motion of interstitial clusters and void growth in Ni and Ni alloys

    NASA Astrophysics Data System (ADS)

    Yoshiie, T.; Ishizaki, T.; Xu, Q.; Satoh, Y.; Kiritani, M.

    2002-12-01

    One dimensional (1-D) motion of interstitial clusters is important for the microstructural evolution in metals. In this paper, the effect of 2 at.% alloying with elements Si (volume size factor to Ni: -5.81%), Cu (7.18%), Ge (14.76%) and Sn (74.08%) in Ni on 1-D motion of interstitial clusters and void growth was studied. In neutron irradiated pure Ni, Ni-Cu and Ni-Ge, well developed dislocation networks and voids in the matrix, and no defects near grain boundaries were observed at 573 K to a dose of 0.4 dpa by transmission electron microscopy. No voids were formed and only interstitial type dislocation loops were observed near grain boundaries in Ni-Si and Ni-Sn. The reaction kinetics analysis which included the point defect flow into planar sink revealed the existence of 1-D motion of interstitial clusters in Ni, Ni-Cu and Ni-Ge, and lack of such motion in Ni-Si and Ni-Sn. In Ni-Sn and Ni-Si, the alloying elements will trap interstitial clusters and thereby reduce the cluster mobility, which lead to the reduction in void growth.

  4. Experimental witness of genuine high-dimensional entanglement

    NASA Astrophysics Data System (ADS)

    Guo, Yu; Hu, Xiao-Min; Liu, Bi-Heng; Huang, Yun-Feng; Li, Chuan-Feng; Guo, Guang-Can

    2018-06-01

    Growing interest has been invested in exploring high-dimensional quantum systems, for their promising perspectives in certain quantum tasks. How to characterize a high-dimensional entanglement structure is one of the basic questions to take full advantage of it. However, it is not easy for us to catch the key feature of high-dimensional entanglement, for the correlations derived from high-dimensional entangled states can be possibly simulated with copies of lower-dimensional systems. Here, we follow the work of Kraft et al. [Phys. Rev. Lett. 120, 060502 (2018), 10.1103/PhysRevLett.120.060502], and present the experimental realizing of creation and detection, by the normalized witness operation, of the notion of genuine high-dimensional entanglement, which cannot be decomposed into lower-dimensional Hilbert space and thus form the entanglement structures existing in high-dimensional systems only. Our experiment leads to further exploration of high-dimensional quantum systems.

  5. DD-HDS: A method for visualization and exploration of high-dimensional data.

    PubMed

    Lespinats, Sylvain; Verleysen, Michel; Giron, Alain; Fertil, Bernard

    2007-09-01

    Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This paper presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. It improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces (1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and (2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to high-dimensional data and the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction methods.

  6. Sparse subspace clustering for data with missing entries and high-rank matrix completion.

    PubMed

    Fan, Jicong; Chow, Tommy W S

    2017-09-01

    Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Khandeev, V. I.

    2016-02-01

    The strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters of given sizes (cardinalities) minimizing the sum (over both clusters) of the intracluster sums of squared distances from the elements of the clusters to their centers is considered. It is assumed that the center of one of the sought clusters is specified at the desired (arbitrary) point of space (without loss of generality, at the origin), while the center of the other one is unknown and determined as the mean value over all elements of this cluster. It is shown that unless P = NP, there is no fully polynomial-time approximation scheme for this problem, and such a scheme is substantiated in the case of a fixed space dimension.

  8. A highly efficient multi-core algorithm for clustering extremely large datasets

    PubMed Central

    2010-01-01

    Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922

  9. Interlaced coarse-graining for the dynamical cluster approximation

    NASA Astrophysics Data System (ADS)

    Haehner, Urs; Staar, Peter; Jiang, Mi; Maier, Thomas; Schulthess, Thomas

    The negative sign problem remains a challenging limiting factor in quantum Monte Carlo simulations of strongly correlated fermionic many-body systems. The dynamical cluster approximation (DCA) makes this problem less severe by coarse-graining the momentum space to map the bulk lattice to a cluster embedded in a dynamical mean-field host. Here, we introduce a new form of an interlaced coarse-graining and compare it with the traditional coarse-graining. We show that it leads to more controlled results with weaker cluster shape and smoother cluster size dependence, which with increasing cluster size converge to the results obtained using the standard coarse-graining. In addition, the new coarse-graining reduces the severity of the fermionic sign problem. Therefore, it enables calculations on much larger clusters and can allow the evaluation of the exact infinite cluster size result via finite size scaling. To demonstrate this, we study the hole-doped two-dimensional Hubbard model and show that the interlaced coarse-graining in combination with the DCA+ algorithm permits the determination of the superconducting Tc on cluster sizes, for which the results can be fitted with the Kosterlitz-Thouless scaling law. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

  10. Weighted Distance Functions Improve Analysis of High-Dimensional Data: Application to Molecular Dynamics Simulations.

    PubMed

    Blöchliger, Nicolas; Caflisch, Amedeo; Vitalis, Andreas

    2015-11-10

    Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.

  11. Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization

    NASA Astrophysics Data System (ADS)

    Liu, Zexi

    2018-01-01

    Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.

  12. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    PubMed

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  13. Inference of Vohradský's Models of Genetic Networks by Solving Two-Dimensional Function Optimization Problems

    PubMed Central

    Kimura, Shuhei; Sato, Masanao; Okada-Hatakeyama, Mariko

    2013-01-01

    The inference of a genetic network is a problem in which mutual interactions among genes are inferred from time-series of gene expression levels. While a number of models have been proposed to describe genetic networks, this study focuses on a mathematical model proposed by Vohradský. Because of its advantageous features, several researchers have proposed the inference methods based on Vohradský's model. When trying to analyze large-scale networks consisting of dozens of genes, however, these methods must solve high-dimensional non-linear function optimization problems. In order to resolve the difficulty of estimating the parameters of the Vohradský's model, this study proposes a new method that defines the problem as several two-dimensional function optimization problems. Through numerical experiments on artificial genetic network inference problems, we showed that, although the computation time of the proposed method is not the shortest, the method has the ability to estimate parameters of Vohradský's models more effectively with sufficiently short computation times. This study then applied the proposed method to an actual inference problem of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations. PMID:24386175

  14. Fuzzy cluster analysis of high-field functional MRI data.

    PubMed

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

    2003-11-01

    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may

  15. Sparse High Dimensional Models in Economics

    PubMed Central

    Fan, Jianqing; Lv, Jinchi; Qi, Lei

    2010-01-01

    This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635

  16. Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    PubMed Central

    Király, András; Abonyi, János

    2014-01-01

    During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651

  17. An ensemble framework for clustering protein-protein interaction networks.

    PubMed

    Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan

    2007-07-01

    Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.

  18. An algebraic homotopy method for generating quasi-three-dimensional grids for high-speed configurations

    NASA Technical Reports Server (NTRS)

    Moitra, Anutosh

    1989-01-01

    A fast and versatile procedure for algebraically generating boundary conforming computational grids for use with finite-volume Euler flow solvers is presented. A semi-analytic homotopic procedure is used to generate the grids. Grids generated in two-dimensional planes are stacked to produce quasi-three-dimensional grid systems. The body surface and outer boundary are described in terms of surface parameters. An interpolation scheme is used to blend between the body surface and the outer boundary in order to determine the field points. The method, albeit developed for analytically generated body geometries is equally applicable to other classes of geometries. The method can be used for both internal and external flow configurations, the only constraint being that the body geometries be specified in two-dimensional cross-sections stationed along the longitudinal axis of the configuration. Techniques for controlling various grid parameters, e.g., clustering and orthogonality are described. Techniques for treating problems arising in algebraic grid generation for geometries with sharp corners are addressed. A set of representative grid systems generated by this method is included. Results of flow computations using these grids are presented for validation of the effectiveness of the method.

  19. Action-minimizing solutions of the one-dimensional N-body problem

    NASA Astrophysics Data System (ADS)

    Yu, Xiang; Zhang, Shiqing

    2018-05-01

    We supplement the following result of C. Marchal on the Newtonian N-body problem: A path minimizing the Lagrangian action functional between two given configurations is always a true (collision-free) solution when the dimension d of the physical space R^d satisfies d≥2. The focus of this paper is on the fixed-ends problem for the one-dimensional Newtonian N-body problem. We prove that a path minimizing the action functional in the set of paths joining two given configurations and having all the time the same order is always a true (collision-free) solution. Considering the one-dimensional N-body problem with equal masses, we prove that (i) collision instants are isolated for a path minimizing the action functional between two given configurations, (ii) if the particles at two endpoints have the same order, then the path minimizing the action functional is always a true (collision-free) solution and (iii) when the particles at two endpoints have different order, although there must be collisions for any path, we can prove that there are at most N! - 1 collisions for any action-minimizing path.

  20. Reconstructing high-dimensional two-photon entangled states via compressive sensing

    PubMed Central

    Tonolini, Francesco; Chan, Susan; Agnew, Megan; Lindsay, Alan; Leach, Jonathan

    2014-01-01

    Accurately establishing the state of large-scale quantum systems is an important tool in quantum information science; however, the large number of unknown parameters hinders the rapid characterisation of such states, and reconstruction procedures can become prohibitively time-consuming. Compressive sensing, a procedure for solving inverse problems by incorporating prior knowledge about the form of the solution, provides an attractive alternative to the problem of high-dimensional quantum state characterisation. Using a modified version of compressive sensing that incorporates the principles of singular value thresholding, we reconstruct the density matrix of a high-dimensional two-photon entangled system. The dimension of each photon is equal to d = 17, corresponding to a system of 83521 unknown real parameters. Accurate reconstruction is achieved with approximately 2500 measurements, only 3% of the total number of unknown parameters in the state. The algorithm we develop is fast, computationally inexpensive, and applicable to a wide range of quantum states, thus demonstrating compressive sensing as an effective technique for measuring the state of large-scale quantum systems. PMID:25306850

  1. High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps

    DOE PAGES

    Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; ...

    2017-10-10

    This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less

  2. High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.

    This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less

  3. HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.

    PubMed

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2011-01-01

    The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.

  4. One-dimensional Coulomb problem in Dirac materials

    NASA Astrophysics Data System (ADS)

    Downing, C. A.; Portnoi, M. E.

    2014-11-01

    We investigate the one-dimensional Coulomb potential with application to a class of quasirelativistic systems, so-called Dirac-Weyl materials, described by matrix Hamiltonians. We obtain the exact solution of the shifted and truncated Coulomb problems, with the wave functions expressed in terms of special functions (namely, Whittaker functions), while the energy spectrum must be determined via solutions to transcendental equations. Most notably, there are critical band gaps below which certain low-lying quantum states are missing in a manifestation of atomic collapse.

  5. An iterative bidirectional heuristic placement algorithm for solving the two-dimensional knapsack packing problem

    NASA Astrophysics Data System (ADS)

    Shiangjen, Kanokwatt; Chaijaruwanich, Jeerayut; Srisujjalertwaja, Wijak; Unachak, Prakarn; Somhom, Samerkae

    2018-02-01

    This article presents an efficient heuristic placement algorithm, namely, a bidirectional heuristic placement, for solving the two-dimensional rectangular knapsack packing problem. The heuristic demonstrates ways to maximize space utilization by fitting the appropriate rectangle from both sides of the wall of the current residual space layer by layer. The iterative local search along with a shift strategy is developed and applied to the heuristic to balance the exploitation and exploration tasks in the solution space without the tuning of any parameters. The experimental results on many scales of packing problems show that this approach can produce high-quality solutions for most of the benchmark datasets, especially for large-scale problems, within a reasonable duration of computational time.

  6. Computing and visualizing time-varying merge trees for high-dimensional data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oesterling, Patrick; Heine, Christian; Weber, Gunther H.

    2017-06-03

    We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.

  7. Numerical computations on one-dimensional inverse scattering problems

    NASA Technical Reports Server (NTRS)

    Dunn, M. H.; Hariharan, S. I.

    1983-01-01

    An approximate method to determine the index of refraction of a dielectric obstacle is presented. For simplicity one dimensional models of electromagnetic scattering are treated. The governing equations yield a second order boundary value problem, in which the index of refraction appears as a functional parameter. The availability of reflection coefficients yield two additional boundary conditions. The index of refraction by a k-th order spline which can be written as a linear combination of B-splines is approximated. For N distinct reflection coefficients, the resulting N boundary value problems yield a system of N nonlinear equations in N unknowns which are the coefficients of the B-splines.

  8. Carbohydrate Cluster Microarrays Fabricated on 3-Dimensional Dendrimeric Platforms for Functional Glycomics Exploration

    PubMed Central

    Zhou, Xichun; Turchi, Craig; Wang, Denong

    2009-01-01

    We reported here a novel, ready-to-use bioarray platform and methodology for construction of sensitive carbohydrate cluster microarrays. This technology utilizes a 3-dimensional (3-D) poly(amidoamine) starburst dendrimer monolayer assembled on glass surface, which is functionalized with terminal aminooxy and hydrazide groups for site-specific coupling of carbohydrates. A wide range of saccharides, including monosaccharides, oligosaccharides and polysaccharides of diverse structures, are applicable for the 3-D bioarray platform without prior chemical derivatization. The process of carbohydrate coupling is effectively accelerated by microwave radiation energy. The carbohydrate concentration required for microarray fabrication is substantially reduced using this technology. Importantly, this bioarray platform presents sugar chains in defined orientation and cluster configurations. It is, thus, uniquely useful for exploration of the structural and conformational diversities of glyco-epitope and their functional properties. PMID:19791771

  9. The Cluster Environment of Two High-mass Protostars

    NASA Astrophysics Data System (ADS)

    Montes, Virginie; Hofner, Peter

    2017-06-01

    Characterizing the environment and stellar population in which high-mass stars form is an important step to decide between the main massive star formation theories. In the monolithic collapse model, the mass of the core will determine the final stellar mass (e.g., McKee & Tan 2003). In contrast, in the competitive accretion model (e.g., Bonnell & Bate 2006), the mass of the high-mass star is related to the properties of the cluster. As dynamical processes substantially affect the appearance of a cluster, we study early stages of high-mass star formation. These regions often show extended emission from hot dust at infrared wavelengths, which can cause difficulties to define the cluster. We use a multi-wavelength technique to study nearby high-mass star clusters, based on X-ray observations with the Chandra X-Ray Telescope, in conjunction with infrared data and VLA data. The technique relies on the fact that YSOs are particularly bright in X-ray and that contamination is relatively small. X-ray observations allow us to determine the cluster size. The cluster membership and YSOs classification is established using infrared identification of the X-ray sources, and color-color and color-magnitude diagrams.In this talk, I will present our findings on the cluster study of two high-mass star forming regions: IRAS 20126+4104 and IRAS 16562-3959. While most massive stars appear to be formed in rich a cluster environment, those two sources are candidates for the formation of massive stars in a relatively poor cluster. In contrast to what was found in previous studies (Qiu et al. 2008), the dominant B0-type protostar in IRAS 20126+4104 is associated with a small cluster of low-mass stars. I will also show our current work on IRAS 16562-3959, which contains one of the most luminous O-type protostars in the Galaxy. In the vicinity of this particularly interesting region there is a multitude of small clusters, for which I will present how their stellar population differ from the

  10. Duke Workshop on High-Dimensional Data Sensing and Analysis

    DTIC Science & Technology

    2015-05-06

    Bayesian sparse factor analysis formulation of Chen et al . ( 2011 ) this work develops multi-label PCA (MLPCA), a generative dimension reduction...version of this problem was recently treated by Banerjee et al . [1], Ravikumar et al . [2], Kolar and Xing [3], and Ho ̈fling and Tibshirani [4]. As...Not applicable. Final Report Duke Workshop on High-Dimensional Data Sensing and Analysis Workshop Dates: July 26-28, 2011

  11. Optical signatures of high-redshift galaxy clusters

    NASA Technical Reports Server (NTRS)

    Evrard, August E.; Charlot, Stephane

    1994-01-01

    We combine an N-body and gasdynamic simulation of structure formation with an updated population synthesis code to explore the expected optical characteristics of a high-redshift cluster of galaxies. We examine a poor (2 keV) cluster formed in a biased, cold dark matter cosmology and employ simple, but plausible, threshold criteria to convert gas into stars. At z = 2, the forming cluster appears as a linear chain of very blue (g-r approximately equals 0) galaxies, with 15 objects brighter than r = 25 within a 1 square arcmin field of view. After 2 Gyr of evolution, the cluster viewed at z = 1 displays both freshly infalling blue galaxies and red galaxies robbed of recent accretion by interaction with the hot intracluster medium. The range in G-R colors is approximately 3 mag at z = 1, with the reddest objects lying at sites of highest galaxy density. We suggest that red, high-redshift galaxies lie in the cores of forming clusters and that their existence indicates the presence of a hot intracluster medium at redshifts z approximately equals 2. The simulated cluster viewed at z = 2 has several characteristics similar to the collection of faint, blue objects identified by Dressler et al. in a deep Hubble Space Telescope observation. The similarities provide some support for the interpretation of this collection as a high-redshift cluster of galaxies.

  12. Finite dimensional approximation of a class of constrained nonlinear optimal control problems

    NASA Technical Reports Server (NTRS)

    Gunzburger, Max D.; Hou, L. S.

    1994-01-01

    An abstract framework for the analysis and approximation of a class of nonlinear optimal control and optimization problems is constructed. Nonlinearities occur in both the objective functional and in the constraints. The framework includes an abstract nonlinear optimization problem posed on infinite dimensional spaces, and approximate problem posed on finite dimensional spaces, together with a number of hypotheses concerning the two problems. The framework is used to show that optimal solutions exist, to show that Lagrange multipliers may be used to enforce the constraints, to derive an optimality system from which optimal states and controls may be deduced, and to derive existence results and error estimates for solutions of the approximate problem. The abstract framework and the results derived from that framework are then applied to three concrete control or optimization problems and their approximation by finite element methods. The first involves the von Karman plate equations of nonlinear elasticity, the second, the Ginzburg-Landau equations of superconductivity, and the third, the Navier-Stokes equations for incompressible, viscous flows.

  13. Problem decomposition by mutual information and force-based clustering

    NASA Astrophysics Data System (ADS)

    Otero, Richard Edward

    The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an

  14. Coronal Mass Ejection Data Clustering and Visualization of Decision Trees

    NASA Astrophysics Data System (ADS)

    Ma, Ruizhe; Angryk, Rafal A.; Riley, Pete; Filali Boubrahimi, Soukaina

    2018-05-01

    Coronal mass ejections (CMEs) can be categorized as either “magnetic clouds” (MCs) or non-MCs. Features such as a large magnetic field, low plasma-beta, and low proton temperature suggest that a CME event is also an MC event; however, so far there is neither a definitive method nor an automatic process to distinguish the two. Human labeling is time-consuming, and results can fluctuate owing to the imprecise definition of such events. In this study, we approach the problem of MC and non-MC distinction from a time series data analysis perspective and show how clustering can shed some light on this problem. Although many algorithms exist for traditional data clustering in the Euclidean space, they are not well suited for time series data. Problems such as inadequate distance measure, inaccurate cluster center description, and lack of intuitive cluster representations need to be addressed for effective time series clustering. Our data analysis in this work is twofold: clustering and visualization. For clustering we compared the results from the popular hierarchical agglomerative clustering technique to a distance density clustering heuristic we developed previously for time series data clustering. In both cases, dynamic time warping will be used for similarity measure. For classification as well as visualization, we use decision trees to aggregate single-dimensional clustering results to form a multidimensional time series decision tree, with averaged time series to present each decision. In this study, we achieved modest accuracy and, more importantly, an intuitive interpretation of how different parameters contribute to an MC event.

  15. Performance of Extended Local Clustering Organization (LCO) for Large Scale Job-Shop Scheduling Problem (JSP)

    NASA Astrophysics Data System (ADS)

    Konno, Yohko; Suzuki, Keiji

    This paper describes an approach to development of a solution algorithm of a general-purpose for large scale problems using “Local Clustering Organization (LCO)” as a new solution for Job-shop scheduling problem (JSP). Using a performance effective large scale scheduling in the study of usual LCO, a solving JSP keep stability induced better solution is examined. In this study for an improvement of a performance of a solution for JSP, processes to a optimization by LCO is examined, and a scheduling solution-structure is extended to a new solution-structure based on machine-division. A solving method introduced into effective local clustering for the solution-structure is proposed as an extended LCO. An extended LCO has an algorithm which improves scheduling evaluation efficiently by clustering of parallel search which extends over plural machines. A result verified by an application of extended LCO on various scale of problems proved to conduce to minimizing make-span and improving on the stable performance.

  16. L2-Boosting algorithm applied to high-dimensional problems in genomic selection.

    PubMed

    González-Recio, Oscar; Weigel, Kent A; Gianola, Daniel; Naya, Hugo; Rosa, Guilherme J M

    2010-06-01

    The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive

  17. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    PubMed Central

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  18. HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2012-01-01

    The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. PMID:22661790

  19. Bayesian Analysis of High Dimensional Classification

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  20. Local matrix learning in clustering and applications for manifold visualization.

    PubMed

    Arnonkijpanich, Banchar; Hasenfuss, Alexander; Hammer, Barbara

    2010-05-01

    Electronic data sets are increasing rapidly with respect to both, size of the data sets and data resolution, i.e. dimensionality, such that adequate data inspection and data visualization have become central issues of data mining. In this article, we present an extension of classical clustering schemes by local matrix adaptation, which allows a better representation of data by means of clusters with an arbitrary spherical shape. Unlike previous proposals, the method is derived from a global cost function. The focus of this article is to demonstrate the applicability of this matrix clustering scheme to low-dimensional data embedding for data inspection. The proposed method is based on matrix learning for neural gas and manifold charting. This provides an explicit mapping of a given high-dimensional data space to low dimensionality. We demonstrate the usefulness of this method for data inspection and manifold visualization. 2009 Elsevier Ltd. All rights reserved.

  1. Information mining over heterogeneous and high-dimensional time-series data in clinical trials databases.

    PubMed

    Altiparmak, Fatih; Ferhatosmanoglu, Hakan; Erdal, Selnur; Trost, Donald C

    2006-04-01

    An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent itemset mining, as well as clustering and declustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analytes (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials data sets, we are able to identify a small set of analytes that effectively models the state of normal health.

  2. A heuristic approach to handle capacitated facility location problem evaluated using clustering internal evaluation

    NASA Astrophysics Data System (ADS)

    Sutanto, G. R.; Kim, S.; Kim, D.; Sutanto, H.

    2018-03-01

    One of the problems in dealing with capacitated facility location problem (CFLP) is occurred because of the difference between the capacity numbers of facilities and the number of customers that needs to be served. A facility with small capacity may result in uncovered customers. These customers need to be re-allocated to another facility that still has available capacity. Therefore, an approach is proposed to handle CFLP by using k-means clustering algorithm to handle customers’ allocation. And then, if customers’ re-allocation is needed, is decided by the overall average distance between customers and the facilities. This new approach is benchmarked to the existing approach by Liao and Guo which also use k-means clustering algorithm as a base idea to decide the facilities location and customers’ allocation. Both of these approaches are benchmarked by using three clustering evaluation methods with connectedness, compactness, and separations factors.

  3. Cluster state generation in one-dimensional Kitaev honeycomb model via shortcut to adiabaticity

    NASA Astrophysics Data System (ADS)

    Kyaw, Thi Ha; Kwek, Leong-Chuan

    2018-04-01

    We propose a mean to obtain computationally useful resource states also known as cluster states, for measurement-based quantum computation, via transitionless quantum driving algorithm. The idea is to cool the system to its unique ground state and tune some control parameters to arrive at computationally useful resource state, which is in one of the degenerate ground states. Even though there is set of conserved quantities already present in the model Hamiltonian, which prevents the instantaneous state to go to any other eigenstate subspaces, one cannot quench the control parameters to get the desired state. In that case, the state will not evolve. With involvement of the shortcut Hamiltonian, we obtain cluster states in fast-forward manner. We elaborate our proposal in the one-dimensional Kitaev honeycomb model, and show that the auxiliary Hamiltonian needed for the counterdiabatic driving is of M-body interaction.

  4. Progeny Clustering: A Method to Identify Biological Phenotypes

    PubMed Central

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  5. Segmentation of High Angular Resolution Diffusion MRI using Sparse Riemannian Manifold Clustering

    PubMed Central

    Wright, Margaret J.; Thompson, Paul M.; Vidal, René

    2015-01-01

    We address the problem of segmenting high angular resolution diffusion imaging (HARDI) data into multiple regions (or fiber tracts) with distinct diffusion properties. We use the orientation distribution function (ODF) to represent HARDI data and cast the problem as a clustering problem in the space of ODFs. Our approach integrates tools from sparse representation theory and Riemannian geometry into a graph theoretic segmentation framework. By exploiting the Riemannian properties of the space of ODFs, we learn a sparse representation for each ODF and infer the segmentation by applying spectral clustering to a similarity matrix built from these representations. In cases where regions with similar (resp. distinct) diffusion properties belong to different (resp. same) fiber tracts, we obtain the segmentation by incorporating spatial and user-specified pairwise relationships into the formulation. Experiments on synthetic data evaluate the sensitivity of our method to image noise and the presence of complex fiber configurations, and show its superior performance compared to alternative segmentation methods. Experiments on phantom and real data demonstrate the accuracy of the proposed method in segmenting simulated fibers, as well as white matter fiber tracts of clinical importance in the human brain. PMID:24108748

  6. Three-dimensional visualization of cultural clusters in the 1878 yellow fever epidemic of New Orleans

    PubMed Central

    Curtis, Andrew J

    2008-01-01

    Background An epidemic may exhibit different spatial patterns with a change in geographic scale, with each scale having different conduits and impediments to disease spread. Mapping disease at each of these scales often reveals different cluster patterns. This paper will consider this change of geographic scale in an analysis of yellow fever deaths for New Orleans in 1878. Global clustering for the whole city, will be followed by a focus on the French Quarter, then clusters of that area, and finally street-level patterns of a single cluster. The three-dimensional visualization capabilities of a GIS will be used as part of a cluster creation process that incorporates physical buildings in calculating mortality-to-mortality distance. Including nativity of the deceased will also capture cultural connection. Results Twenty-two yellow fever clusters were identified for the French Quarter. These generally mirror the results of other global cluster and density surfaces created for the entire epidemic in New Orleans. However, the addition of building-distance, and disease specific time frame between deaths reveal that disease spread contains a cultural component. Same nativity mortality clusters emerge in a similar time frame irrespective of proximity. Italian nativity mortalities were far more densely grouped than any of the other cohorts. A final examination of mortalities for one of the nativity clusters reveals that further sub-division is present, and that this pattern would only be revealed at this scale (street level) of investigation. Conclusion Disease spread in an epidemic is complex resulting from a combination of geographic distance, geographic distance with specific connection to the built environment, disease-specific time frame between deaths, impediments such as herd immunity, and social or cultural connection. This research has shown that the importance of cultural connection may be more important than simple proximity, which in turn might mean traditional

  7. Three-dimensional visualization of cultural clusters in the 1878 yellow fever epidemic of New Orleans.

    PubMed

    Curtis, Andrew J

    2008-08-22

    An epidemic may exhibit different spatial patterns with a change in geographic scale, with each scale having different conduits and impediments to disease spread. Mapping disease at each of these scales often reveals different cluster patterns. This paper will consider this change of geographic scale in an analysis of yellow fever deaths for New Orleans in 1878. Global clustering for the whole city, will be followed by a focus on the French Quarter, then clusters of that area, and finally street-level patterns of a single cluster. The three-dimensional visualization capabilities of a GIS will be used as part of a cluster creation process that incorporates physical buildings in calculating mortality-to-mortality distance. Including nativity of the deceased will also capture cultural connection. Twenty-two yellow fever clusters were identified for the French Quarter. These generally mirror the results of other global cluster and density surfaces created for the entire epidemic in New Orleans. However, the addition of building-distance, and disease specific time frame between deaths reveal that disease spread contains a cultural component. Same nativity mortality clusters emerge in a similar time frame irrespective of proximity. Italian nativity mortalities were far more densely grouped than any of the other cohorts. A final examination of mortalities for one of the nativity clusters reveals that further sub-division is present, and that this pattern would only be revealed at this scale (street level) of investigation. Disease spread in an epidemic is complex resulting from a combination of geographic distance, geographic distance with specific connection to the built environment, disease-specific time frame between deaths, impediments such as herd immunity, and social or cultural connection. This research has shown that the importance of cultural connection may be more important than simple proximity, which in turn might mean traditional quarantine measures should be

  8. Adaptive finite element methods for two-dimensional problems in computational fracture mechanics

    NASA Technical Reports Server (NTRS)

    Min, J. B.; Bass, J. M.; Spradley, L. W.

    1994-01-01

    Some recent results obtained using solution-adaptive finite element methods in two-dimensional problems in linear elastic fracture mechanics are presented. The focus is on the basic issue of adaptive finite element methods for validating the new methodology by computing demonstration problems and comparing the stress intensity factors to analytical results.

  9. BOOK REVIEW: The Gravitational Million-Body Problem: A Multidisciplinary Approach to Star Cluster Dynamics

    NASA Astrophysics Data System (ADS)

    Heggie, D.; Hut, P.

    2003-10-01

    focus on N = 106 for two main reasons: first, direct numerical integrations of N-body systems are beginning to approach this threshold, and second, globular star clusters provide remarkably accurate physical instantiations of the idealized N-body problem with N = 105 - 106. The authors are distinguished contributors to the study of star-cluster dynamics and the gravitational N-body problem. The book contains lucid and concise descriptions of most of the important tools in the subject, with only a modest bias towards the authors' own interests. These tools include the two-body relaxation approximation, the Vlasov and Fokker-Planck equations, regularization of close encounters, conducting fluid models, Hill's approximation, Heggie's law for binary star evolution, symplectic integration algorithms, Liapunov exponents, and so on. The book also provides an up-to-date description of the principal processes that drive the evolution of idealized N-body systems - two-body relaxation, mass segregation, escape, core collapse and core bounce, binary star hardening, gravothermal oscillations - as well as additional processes such as stellar collisions and tidal shocks that affect real star clusters but not idealized N-body systems. In a relatively short (300 pages plus appendices) book such as this, many topics have to be omitted. The reader who is hoping to learn about the phenomenology of star clusters will be disappointed, as the description of their properties is limited to only a page of text; there is also almost no discussion of other, equally interesting N-body systems such as galaxies(N approx 106 - 1012), open clusters (N simeq 102 - 104), planetary systems, or the star clusters surrounding black holes that are found in the centres of most galaxies. All of these omissions are defensible decisions. Less defensible is the uneven set of references in the text; for example, nowhere is the reader informed that the classic predecessor to this work was Spitzer's 1987 monograph

  10. Peak clustering in two-dimensional gas chromatography with mass spectrometric detection based on theoretical calculation of two-dimensional peak shapes: the 2DAid approach.

    PubMed

    van Stee, Leo L P; Brinkman, Udo A Th

    2011-10-28

    A method is presented to facilitate the non-target analysis of data obtained in temperature-programmed comprehensive two-dimensional (2D) gas chromatography coupled to time-of-flight mass spectrometry (GC×GC-ToF-MS). One main difficulty of GC×GC data analysis is that each peak is usually modulated several times and therefore appears as a series of peaks (or peaklets) in the one-dimensionally recorded data. The proposed method, 2DAid, uses basic chromatographic laws to calculate the theoretical shape of a 2D peak (a cluster of peaklets originating from the same analyte) in order to define the area in which the peaklets of each individual compound can be expected to show up. Based on analyte-identity information obtained by means of mass spectral library searching, the individual peaklets are then combined into a single 2D peak. The method is applied, amongst others, to a complex mixture containing 362 analytes. It is demonstrated that the 2D peak shapes can be accurately predicted and that clustering and further processing can reduce the final peak list to a manageable size. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. On the modification Highly Connected Subgraphs (HCS) algorithm in graph clustering for weighted graph

    NASA Astrophysics Data System (ADS)

    Albirri, E. R.; Sugeng, K. A.; Aldila, D.

    2018-04-01

    Nowadays, in the modern world, since technology and human civilization start to progress, all city in the world is almost connected. The various places in this world are easier to visit. It is an impact of transportation technology and highway construction. The cities which have been connected can be represented by graph. Graph clustering is one of ways which is used to answer some problems represented by graph. There are some methods in graph clustering to solve the problem spesifically. One of them is Highly Connected Subgraphs (HCS) method. HCS is used to identify cluster based on the graph connectivity k for graph G. The connectivity in graph G is denoted by k(G)> \\frac{n}{2} that n is the total of vertices in G, then it is called as HCS or the cluster. This research used literature review and completed with simulation of program in a software. We modified HCS algorithm by using weighted graph. The modification is located in the Process Phase. Process Phase is used to cut the connected graph G into two subgraphs H and \\bar{H}. We also made a program by using software Octave-401. Then we applied the data of Flight Routes Mapping of One of Airlines in Indonesia to our program.

  12. HIGH-ENERGY NEUTRINOS FROM SOURCES IN CLUSTERS OF GALAXIES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fang, Ke; Olinto, Angela V.

    2016-09-01

    High-energy cosmic rays can be accelerated in clusters of galaxies, by mega-parsec scale shocks induced by the accretion of gas during the formation of large-scale structures, or by powerful sources harbored in clusters. Once accelerated, the highest energy particles leave the cluster via almost rectilinear trajectories, while lower energy ones can be confined by the cluster magnetic field up to cosmological time and interact with the intracluster gas. Using a realistic model of the baryon distribution and the turbulent magnetic field in clusters, we studied the propagation and hadronic interaction of high-energy protons in the intracluster medium. We report themore » cumulative cosmic-ray and neutrino spectra generated by galaxy clusters, including embedded sources, and demonstrate that clusters can contribute a significant fraction of the observed IceCube neutrinos above 30 TeV while remaining undetected in high-energy cosmic rays and γ rays for reasonable choices of parameters and source scenarios.« less

  13. How to cluster in parallel with neural networks

    NASA Technical Reports Server (NTRS)

    Kamgar-Parsi, Behzad; Gualtieri, J. A.; Devaney, Judy E.; Kamgar-Parsi, Behrooz

    1988-01-01

    Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities.

  14. Cluster-based control of a separating flow over a smoothly contoured ramp

    NASA Astrophysics Data System (ADS)

    Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek

    2017-12-01

    The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.

  15. Cluster Analysis and Gaussian Mixture Estimation of Correlated Time-Series by Means of Multi-dimensional Scaling

    NASA Astrophysics Data System (ADS)

    Ibuki, Takero; Suzuki, Sei; Inoue, Jun-ichi

    We investigate cross-correlations between typical Japanese stocks collected through Yahoo!Japan website ( http://finance.yahoo.co.jp/ ). By making use of multi-dimensional scaling (MDS) for the cross-correlation matrices, we draw two-dimensional scattered plots in which each point corresponds to each stock. To make a clustering for these data plots, we utilize the mixture of Gaussians to fit the data set to several Gaussian densities. By minimizing the so-called Akaike Information Criterion (AIC) with respect to parameters in the mixture, we attempt to specify the best possible mixture of Gaussians. It might be naturally assumed that all the two-dimensional data points of stocks shrink into a single small region when some economic crisis takes place. The justification of this assumption is numerically checked for the empirical Japanese stock data, for instance, those around 11 March 2011.

  16. Three-Dimensional Electromagnetic High Frequency Axisymmetric Cavity Scars.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Warne, Larry Kevin; Jorgenson, Roy Eberhardt

    This report examines the localization of high frequency electromagnetic fi elds in three-dimensional axisymmetric cavities along periodic paths between opposing sides of the cavity. The cases where these orbits lead to unstable localized modes are known as scars. This report treats both the case where the opposing sides, or mirrors, are convex, where there are no interior foci, and the case where they are concave, leading to interior foci. The scalar problem is treated fi rst but the approximations required to treat the vector fi eld components are also examined. Particular att ention is focused on the normalization through themore » electromagnetic energy theorem. Both projections of the fi eld along the scarred orbit as well as point statistics are examined. Statistical comparisons are m ade with a numerical calculation of the scars run with an axisymmetric simulation. This axisymmetric cas eformstheoppositeextreme(wherethetwomirror radii at each end of the ray orbit are equal) from the two -dimensional solution examined previously (where one mirror radius is vastly di ff erent from the other). The enhancement of the fi eldontheorbitaxiscanbe larger here than in the two-dimensional case. Intentionally Left Blank« less

  17. Probabilistic classifiers with high-dimensional data

    PubMed Central

    Kim, Kyung In; Simon, Richard

    2011-01-01

    For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946

  18. An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures

    NASA Technical Reports Server (NTRS)

    Cohl, H.

    1994-01-01

    We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.

  19. Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    PubMed Central

    Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257

  20. Comment on "Calculations for the one-dimensional soft Coulomb problem and the hard Coulomb limit".

    PubMed

    Carrillo-Bernal, M A; Núñez-Yépez, H N; Salas-Brito, A L; Solis, Didier A

    2015-02-01

    In the referred paper, the authors use a numerical method for solving ordinary differential equations and a softened Coulomb potential -1/√[x(2)+β(2)] to study the one-dimensional Coulomb problem by approaching the parameter β to zero. We note that even though their numerical findings in the soft potential scenario are correct, their conclusions do not extend to the one-dimensional Coulomb problem (β=0). Their claims regarding the possible existence of an even ground state with energy -∞ with a Dirac-δ eigenfunction and of well-defined parity eigenfunctions in the one-dimensional hydrogen atom are questioned.

  1. A boundary element alternating method for two-dimensional mixed-mode fracture problems

    NASA Technical Reports Server (NTRS)

    Raju, I. S.; Krishnamurthy, T.

    1992-01-01

    A boundary element alternating method, denoted herein as BEAM, is presented for two dimensional fracture problems. This is an iterative method which alternates between two solutions. An analytical solution for arbitrary polynomial normal and tangential pressure distributions applied to the crack faces of an embedded crack in an infinite plate is used as the fundamental solution in the alternating method. A boundary element method for an uncracked finite plate is the second solution. For problems of edge cracks a technique of utilizing finite elements with BEAM is presented to overcome the inherent singularity in boundary element stress calculation near the boundaries. Several computational aspects that make the algorithm efficient are presented. Finally, the BEAM is applied to a variety of two dimensional crack problems with different configurations and loadings to assess the validity of the method. The method gives accurate stress intensity factors with minimal computing effort.

  2. Numerical approximation for the infinite-dimensional discrete-time optimal linear-quadratic regulator problem

    NASA Technical Reports Server (NTRS)

    Gibson, J. S.; Rosen, I. G.

    1986-01-01

    An abstract approximation framework is developed for the finite and infinite time horizon discrete-time linear-quadratic regulator problem for systems whose state dynamics are described by a linear semigroup of operators on an infinite dimensional Hilbert space. The schemes included the framework yield finite dimensional approximations to the linear state feedback gains which determine the optimal control law. Convergence arguments are given. Examples involving hereditary and parabolic systems and the vibration of a flexible beam are considered. Spline-based finite element schemes for these classes of problems, together with numerical results, are presented and discussed.

  3. a Speculative Study on Negative-Dimensional Potential and Wave Problems by Implicit Calculus Modeling Approach

    NASA Astrophysics Data System (ADS)

    Chen, Wen; Wang, Fajie

    Based on the implicit calculus equation modeling approach, this paper proposes a speculative concept of the potential and wave operators on negative dimensionality. Unlike the standard partial differential equation (PDE) modeling, the implicit calculus modeling approach does not require the explicit expression of the PDE governing equation. Instead the fundamental solution of physical problem is used to implicitly define the differential operator and to implement simulation in conjunction with the appropriate boundary conditions. In this study, we conjecture an extension of the fundamental solution of the standard Laplace and Helmholtz equations to negative dimensionality. And then by using the singular boundary method, a recent boundary discretization technique, we investigate the potential and wave problems using the fundamental solution on negative dimensionality. Numerical experiments reveal that the physics behaviors on negative dimensionality may differ on positive dimensionality. This speculative study might open an unexplored territory in research.

  4. Cluster size dependence of high-order harmonic generation

    NASA Astrophysics Data System (ADS)

    Tao, Y.; Hagmeijer, R.; Bastiaens, H. M. J.; Goh, S. J.; van der Slot, P. J. M.; Biedron, S. G.; Milton, S. V.; Boller, K.-J.

    2017-08-01

    We investigate high-order harmonic generation (HHG) from noble gas clusters in a supersonic gas jet. To identify the contribution of harmonic generation from clusters versus that from gas monomers, we measure the high-order harmonic output over a broad range of the total atomic number density in the jet (from 3×1016 to 3 × 1018 {{cm}}-3) at two different reservoir temperatures (303 and 363 K). For the first time in the evaluation of the harmonic yield in such measurements, the variation of the liquid mass fraction, g, versus pressure and temperature is taken into consideration, which we determine, reliably and consistently, to be below 20% within our range of experimental parameters. By comparing the measured harmonic yield from a thin jet with the calculated corresponding yield from monomers alone, we find an increased emission of the harmonics when the average cluster size is less than 3000. Using g, under the assumption that the emission from monomers and clusters add up coherently, we calculate the ratio of the average single-atom response of an atom within a cluster to that of a monomer and find an enhancement of around 100 for very small average cluster size (∼200). We do not find any dependence of the cut-off frequency on the composition of the cluster jet. This implies that HHG in clusters is based on electrons that return to their parent ions and not to neighboring ions in the cluster. To fully employ the enhanced average single-atom response found for small average cluster sizes (∼200), the nozzle producing the cluster jet must provide a large liquid mass fraction at these small cluster sizes for increasing the harmonic yield. Moreover, cluster jets may allow for quasi-phase matching, as the higher mass of clusters allows for a higher density contrast in spatially structuring the nonlinear medium.

  5. High Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

    NASA Technical Reports Server (NTRS)

    Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

    1994-01-01

    In order to predict the dynamic response of a flexible structure in a fluid flow, the equations of motion of the structure and the fluid must be solved simultaneously. In this paper, we present several partitioned procedures for time-integrating this focus coupled problem and discuss their merits in terms of accuracy, stability, heterogeneous computing, I/O transfers, subcycling, and parallel processing. All theoretical results are derived for a one-dimensional piston model problem with a compressible flow, because the complete three-dimensional aeroelastic problem is difficult to analyze mathematically. However, the insight gained from the analysis of the coupled piston problem and the conclusions drawn from its numerical investigation are confirmed with the numerical simulation of the two-dimensional transient aeroelastic response of a flexible panel in a transonic nonlinear Euler flow regime.

  6. Identification of crystalline structures in jet-cooled acetylene large clusters studied by two-dimensional correlation infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Matsumoto, Yoshiteru; Yoshiura, Ryuto; Honma, Kenji

    2017-07-01

    We investigated the crystalline structures of jet-cooled acetylene (C2H2) large clusters by laser spectroscopy and chemometrics. The CH stretching vibrations of the C2H2 large clusters were observed by infrared (IR) cavity ringdown spectroscopy. The IR spectra of C2H2 clusters were measured under the conditions of various concentrations of C2H2/He mixture gas for supersonic jets. Upon increasing the gas concentration from 1% to 10%, we observed a rapid intensity enhancement for a band in the IR spectra. The strong dependence of the intensity on the gas concentration indicates that the band was assigned to CH stretching vibrations of the large clusters. An analysis of the IR spectra by two-dimensional correlation spectroscopy revealed that the IR absorption due to the C2H2 large cluster is decomposed into two CH stretching vibrations. The vibrational frequencies of the two bands are almost equivalent to the IR absorption of the pure- and poly-crystalline orthorhombic structures in the aerosol particles. The characteristic temperature behavior of the IR spectra implies the existence of the other large cluster, which is discussed in terms of the phase transition of a bulk crystal.

  7. On solving three-dimensional open-dimension rectangular packing problems

    NASA Astrophysics Data System (ADS)

    Junqueira, Leonardo; Morabito, Reinaldo

    2017-05-01

    In this article, a recently proposed three-dimensional open-dimension rectangular packing problem is considered, in which the objective is to find a minimal volume rectangular container that packs a set of rectangular boxes. The literature has tackled small-sized instances of this problem by means of optimization solvers, position-free mixed-integer programming (MIP) formulations and piecewise linearization approaches. In this study, the problem is alternatively addressed by means of grid-based position MIP formulations, whereas still considering optimization solvers and the same piecewise linearization techniques. A comparison of the computational performance of both models is then presented, when tested with benchmark problem instances and with new instances, and it is shown that the grid-based position MIP formulation can be competitive, depending on the characteristics of the instances. The grid-based position MIP formulation is also embedded with real-world practical constraints, such as cargo stability, and results are additionally presented.

  8. EM in high-dimensional spaces.

    PubMed

    Draper, Bruce A; Elliott, Daniel L; Hayes, Jeremy; Baek, Kyungim

    2005-06-01

    This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N - 1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.

  9. Cluster analysis of sputum cytokine-high profiles reveals diversity in T(h)2-high asthma patients.

    PubMed

    Seys, Sven F; Scheers, Hans; Van den Brande, Paul; Marijsse, Gudrun; Dilissen, Ellen; Van Den Bergh, Annelies; Goeminne, Pieter C; Hellings, Peter W; Ceuppens, Jan L; Dupont, Lieven J; Bullens, Dominique M A

    2017-02-23

    Asthma is characterized by a heterogeneous inflammatory profile and can be subdivided into T(h)2-high and T(h)2-low airway inflammation. Profiling of a broader panel of airway cytokines in large unselected patient cohorts is lacking. Patients (n = 205) were defined as being "cytokine-low/high" if sputum mRNA expression of a particular cytokine was outside the respective 10 th /90 th percentile range of the control group (n = 80). Unsupervised hierarchical clustering was used to determine clusters based on sputum cytokine profiles. Half of patients (n = 108; 52.6%) had a classical T(h)2-high ("IL-4-, IL-5- and/or IL-13-high") sputum cytokine profile. Unsupervised cluster analysis revealed 5 clusters. Patients with an "IL-4- and/or IL-13-high" pattern surprisingly did not cluster but were equally distributed among the 5 clusters. Patients with an "IL-5-, IL-17A-/F- and IL-25- high" profile were restricted to cluster 1 (n = 24) with increased sputum eosinophil as well as neutrophil counts and poor lung function parameters at baseline and 2 years later. Four other clusters were identified: "IL-5-high or IL-10-high" (n = 16), "IL-6-high" (n = 8), "IL-22-high" (n = 25). Cluster 5 (n = 132) consists of patients without "cytokine-high" pattern or patients with only high IL-4 and/or IL-13. We identified 5 unique asthma molecular phenotypes by biological clustering. Type 2 cytokines cluster with non-type 2 cytokines in 4 out of 5 clusters. Unsupervised analysis thus not supports a priori type 2 versus non-type 2 molecular phenotypes. www.clinicaltrials.gov NCT01224938. Registered 18 October 2010.

  10. Benzoate-Induced High-Nuclearity Silver Thiolate Clusters.

    PubMed

    Su, Yan-Min; Liu, Wei; Wang, Zhi; Wang, Shu-Ao; Li, Yan-An; Yu, Fei; Zhao, Quan-Qin; Wang, Xing-Po; Tung, Chen-Ho; Sun, Di

    2018-04-03

    Compared with the well-known anion-templated effects in shaping silver thiolate clusters, the influence from the organic ligands in the outer shell is still poorly understood. Herein, three new benzoate-functionalized high-nuclearity silver(I) thiolate clusters are isolated and characterized for the first time in the presence of diverse anion templates such as S 2- , α-[Mo 5 O 18 ] 6- , and MoO 4 2- . Single-crystal X-ray analysis reveals that the nuclearities of the three silver clusters (SD/Ag28, SD/Ag29, SD/Ag30) vary from 32 to 38 to 78 with co-capped tBuS - and benzoate ligands on the surface. SD/Ag28 is a turtle-like cluster comprising a Ag 29 shell caging a Ag 3 S 3 trigon in the center, whereas SD/Ag29 is a prolate Ag 38 sphere templated by the α-[Mo 5 O 18 ] 6- anion. Upon changing from benzoate to methoxyl-substituted benzoate, SD/Ag30 is isolated as a very complicated core-shell spherical cluster composed of a Ag 57 shell and a vase-like Ag 21 S 13 core. Four MoO 4 2- anions are arranged in a supertetrahedron and located in the interstice between the core and shell. Introduction of the bulky benzoate changes elaborately the nuclearity and arrangements of silver polygons on the shell of silver clusters, which is exemplified by comparing SD/Ag28 and a known similar silver thiolate cluster. The three new clusters emit luminescence in the near-infrared (NIR) region and show different thermochromic luminescence properties. This work presents a flexible approach to synthetic studies of high-nuclearity silver clusters decorated by different benzoates, and structural modulations are also achieved. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Very high order discontinuous Galerkin method in elliptic problems

    NASA Astrophysics Data System (ADS)

    Jaśkowiec, Jan

    2017-09-01

    The paper deals with high-order discontinuous Galerkin (DG) method with the approximation order that exceeds 20 and reaches 100 and even 1000 with respect to one-dimensional case. To achieve such a high order solution, the DG method with finite difference method has to be applied. The basis functions of this method are high-order orthogonal Legendre or Chebyshev polynomials. These polynomials are defined in one-dimensional space (1D), but they can be easily adapted to two-dimensional space (2D) by cross products. There are no nodes in the elements and the degrees of freedom are coefficients of linear combination of basis functions. In this sort of analysis the reference elements are needed, so the transformations of the reference element into the real one are needed as well as the transformations connected with the mesh skeleton. Due to orthogonality of the basis functions, the obtained matrices are sparse even for finite elements with more than thousands degrees of freedom. In consequence, the truncation errors are limited and very high-order analysis can be performed. The paper is illustrated with a set of benchmark examples of 1D and 2D for the elliptic problems. The example presents the great effectiveness of the method that can shorten the length of calculation over hundreds times.

  12. Very high order discontinuous Galerkin method in elliptic problems

    NASA Astrophysics Data System (ADS)

    Jaśkowiec, Jan

    2018-07-01

    The paper deals with high-order discontinuous Galerkin (DG) method with the approximation order that exceeds 20 and reaches 100 and even 1000 with respect to one-dimensional case. To achieve such a high order solution, the DG method with finite difference method has to be applied. The basis functions of this method are high-order orthogonal Legendre or Chebyshev polynomials. These polynomials are defined in one-dimensional space (1D), but they can be easily adapted to two-dimensional space (2D) by cross products. There are no nodes in the elements and the degrees of freedom are coefficients of linear combination of basis functions. In this sort of analysis the reference elements are needed, so the transformations of the reference element into the real one are needed as well as the transformations connected with the mesh skeleton. Due to orthogonality of the basis functions, the obtained matrices are sparse even for finite elements with more than thousands degrees of freedom. In consequence, the truncation errors are limited and very high-order analysis can be performed. The paper is illustrated with a set of benchmark examples of 1D and 2D for the elliptic problems. The example presents the great effectiveness of the method that can shorten the length of calculation over hundreds times.

  13. High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

    PubMed Central

    Daye, Z. John; Chen, Jinbo; Li, Hongzhe

    2011-01-01

    Summary We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833

  14. Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Miller, Christopher J. Miller

    2012-03-01

    of galaxy clusters will be at locations of the peaks in the true underlying (mostly) dark matter density field. Kaiser (1984) [19] called this the high-peak model, which we demonstrate in Figure 16.1. We show a two-dimensional representation of a density field created by summing plane-waves with a predetermined power and with random wave-vector directions. In the left panel, we plot only the largest modes, where we see the density peaks (black) and valleys (white) in the combined field. In the right panel, we allow for smaller modes. You can see that the highest density peaks in the left panel contain smaller-scale, but still high-density peaks. These are the locations of future galaxy clusters. The bottom panel shows just these cluster-scale peaks. As you can see, the peaks themselves are clustered, and instead of just one large high-density peak in the original density field (see the left panel), the smaller modes show that six peaks are "born" within the broader, underlying large-scale density modes. This exemplifies the "bias" or amplified structure that is traced by galaxy clusters [19]. Clusters are rare, easy to find, and their member galaxies provide good distance estimates. In combination with their amplified clustering signal described above, galaxy clusters are considered an efficient and precise tracer of the large-scale matter density field in the Universe. Galaxy clusters can also be used to measure the baryon content of the Universe [43]. They can be used to identify gravitational lenses [38] and map the distribution of matter in clusters. The number and spatial distribution of galaxy clusters can be used to constrain cosmological parameters, like the fraction of the energy density in the Universe due to matter (Omega_matter) or the variation in the density field on fixed physical scales (sigma_8) [26,33]. The individual clusters act as “Island Universes” and as such are laboratories here we can study the evolution of the properties of the cluster

  15. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    DTIC Science & Technology

    2015-01-01

    ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large

  16. A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

    PubMed

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.

  17. A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

    PubMed Central

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662

  18. The problem of dimensional instability in airfoil models for cryogenic wind tunnels

    NASA Technical Reports Server (NTRS)

    Wigley, D. A.

    1982-01-01

    The problem of dimensional instability in airfoil models for cryogenic wind tunnels is discussed in terms of the various mechanisms that can be responsible. The interrelationship between metallurgical structure and possible dimensional instability in cryogenic usage is discussed for those steel alloys of most interest for wind tunnel model construction at this time. Other basic mechanisms responsible for setting up residual stress systems are discussed, together with ways in which their magnitude may be reduced by various elevated or low temperature thermal cycles. A standard specimen configuration is proposed for use in experimental investigations into the effects of machining, heat treatment, and other variables that influence the dimensional stability of the materials of interest. A brief classification of various materials in terms of their metallurgical structure and susceptability to dimensional instability is presented.

  19. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

    PubMed Central

    Lin, Wei; Feng, Rui; Li, Hongzhe

    2014-01-01

    In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642

  20. Stimuli Reduce the Dimensionality of Cortical Activity

    PubMed Central

    Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo

    2016-01-01

    The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models. PMID:26924968

  1. Stimuli Reduce the Dimensionality of Cortical Activity.

    PubMed

    Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo

    2016-01-01

    The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models.

  2. Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification.

    PubMed

    Arif, Muhammad

    2012-06-01

    In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

  3. Resolving the problem of galaxy clustering on small scales: any new physics needed?

    NASA Astrophysics Data System (ADS)

    Kang, X.

    2014-02-01

    Galaxy clustering sets strong constraints on the physics governing galaxy formation and evolution. However, most current models fail to reproduce the clustering of low-mass galaxies on small scales (r < 1 Mpc h-1). In this paper, we study the galaxy clusterings predicted from a few semi-analytical models. We first compare two Munich versions, Guo et al. and De Lucia & Blaizot. The Guo11 model well reproduces the galaxy stellar mass function, but overpredicts the clustering of low-mass galaxies on small scales. The DLB07 model provides a better fit to the clustering on small scales, but overpredicts the stellar mass function. These seem to be puzzling. The clustering on small scales is dominated by galaxies in the same dark matter halo, and there is slightly more fraction of satellite galaxies residing in massive haloes in the Guo11 model, which is the dominant contribution to the clustering discrepancy between the two models. However, both models still overpredict the clustering at 0.1 < r < 10 Mpc h-1 for low-mass galaxies. This is because both models overpredict the number of satellites by 30 per cent in massive haloes than the data. We show that the Guo11 model could be slightly modified to simultaneously fit the stellar mass function and clusterings, but that cannot be easily achieved in the DLB07 model. The better agreement of DLB07 model with the data actually comes as a coincidence as it predicts too many low-mass central galaxies which are less clustered and thus brings down the total clustering. Finally, we show the predictions from the semi-analytical models of Kang et al. We find that this model can simultaneously fit the stellar mass function and galaxy clustering if the supernova feedback in satellite galaxies is stronger. We conclude that semi-analytical models are now able to solve the small-scales clustering problem, without invoking of any other new physics or changing the dark matter properties, such as the recent favoured warm dark matter.

  4. Solving time-dependent two-dimensional eddy current problems

    NASA Technical Reports Server (NTRS)

    Lee, Min Eig; Hariharan, S. I.; Ida, Nathan

    1990-01-01

    Transient eddy current calculations are presented for an EM wave-scattering and field-penetrating case in which a two-dimensional transverse magnetic field is incident on a good (i.e., not perfect) and infinitely long conductor. The problem thus posed is of initial boundary-value interface type, where the boundary of the conductor constitutes the interface. A potential function is used for time-domain modeling of the situation, and finite difference-time domain techniques are used to march the potential function explicitly in time. Attention is given to the case of LF radiation conditions.

  5. Thermodynamics of confined gallium clusters.

    PubMed

    Chandrachud, Prachi

    2015-11-11

    We report the results of ab initio molecular dynamics simulations of Ga13 and Ga17 clusters confined inside carbon nanotubes with different diameters. The cluster-tube interaction is simulated by the Lennard-Jones (LJ) potential. We discuss the geometries, the nature of the bonding and the thermodynamics under confinement. The geometries as well as the isomer spectra of both the clusters are significantly affected. The degree of confinement decides the dimensionality of the clusters. We observe that a number of low-energy isomers appear under moderate confinement while some isomers seen in the free space disappear. Our finite-temperature simulations bring out interesting aspects, namely that the heat capacity curve is flat, even though the ground state is symmetric. Such a flat nature indicates that the phase change is continuous. This effect is due to the restricted phase space available to the system. These observations are supported by the mean square displacement of individual atoms, which are significantly smaller than in free space. The nature of the bonding is found to be approximately jellium-like. Finally we note the relevance of the work to the problem of single file diffusion for the case of the highest confinement.

  6. A one-dimensional nonlinear problem of thermoelasticity in extended thermodynamics

    NASA Astrophysics Data System (ADS)

    Rawy, E. K.

    2018-06-01

    We solve a nonlinear, one-dimensional initial boundary-value problem of thermoelasticity in generalized thermodynamics. A Cattaneo-type evolution equation for the heat flux is used, which differs from the one used extensively in the literature. The hyperbolic nature of the associated linear system is clarified through a study of the characteristic curves. Progressive wave solutions with two finite speeds are noted. A numerical treatment is presented for the nonlinear system using a three-step, quasi-linearization, iterative finite-difference scheme for which the linear system of equations is the initial step in the iteration. The obtained results are discussed in detail. They clearly show the hyperbolic nature of the system, and may be of interest in investigating thermoelastic materials, not only at low temperatures, but also during high temperature processes involving rapid changes in temperature as in laser treatment of surfaces.

  7. Four-dimensional reconstruction of cultural heritage sites based on photogrammetry and clustering

    NASA Astrophysics Data System (ADS)

    Voulodimos, Athanasios; Doulamis, Nikolaos; Fritsch, Dieter; Makantasis, Konstantinos; Doulamis, Anastasios; Klein, Michael

    2017-01-01

    A system designed and developed for the three-dimensional (3-D) reconstruction of cultural heritage (CH) assets is presented. Two basic approaches are presented. The first one, resulting in an "approximate" 3-D model, uses images retrieved in online multimedia collections; it employs a clustering-based technique to perform content-based filtering and eliminate outliers that significantly reduce the performance of 3-D reconstruction frameworks. The second one is based on input image data acquired through terrestrial laser scanning, as well as close range and airborne photogrammetry; it follows a sophisticated multistep strategy, which leads to a "precise" 3-D model. Furthermore, the concept of change history maps is proposed to address the computational limitations involved in four-dimensional (4-D) modeling, i.e., capturing 3-D models of a CH landmark or site at different time instances. The system also comprises a presentation viewer, which manages the display of the multifaceted CH content collected and created. The described methods have been successfully applied and evaluated in challenging real-world scenarios, including the 4-D reconstruction of the historic Market Square of the German city of Calw in the context of the 4-D-CH-World EU project.

  8. High and low neurobehavior disinhibition clusters within locales: implications for community efforts to prevent substance use disorder.

    PubMed

    Ridenour, Ty A; Reynolds, Maureen; Ahlqvist, Ola; Zhai, Zu Wei; Kirisci, Levent; Vanyukov, Michael M; Tarter, Ralph E

    2013-05-01

    Knowledge of where substance use and other such behavioral problems frequently occur has aided policing, public health, and urban planning strategies to reduce such behaviors. Identifying locales characterized by high childhood neurobehavioral disinhibition (ND), a strong predictor of substance use and consequent disorder (SUD), may likewise improve prevention efforts. The distribution of ND in 10-12-year olds was mapped to metropolitan Pittsburgh, PA, and tested for clustering within locales. The 738 participating families represented the population in terms of economic status, race, and population distribution. ND was measured using indicators of executive cognitive function, emotion regulation, and behavior control. Innovative geospatial analyzes statistically tested clustering of ND within locales while accounting for geographic barriers (large rivers, major highways), parental SUD severity, and neighborhood quality. Clustering of youth with high and low ND occurred in specific locales. Accounting for geographic barriers better delineated where high ND is concentrated, areas which also tended to be characterized by greater parental SUD severity and poorer neighborhood quality. Offering programs that have been demonstrated to improve inhibitory control in locales where youth have high ND on average may reduce youth risk for SUD and other problem behaviors. As demonstrated by the present results, geospatial analysis of youth risk factors, frequently used in community coalition strategies, may be improved with greater statistical and measurement rigor.

  9. Cosmology and the large-mass problem of the five-dimensional Kaluza-Klein theory

    NASA Astrophysics Data System (ADS)

    Lukács, B.; Pacher, T.

    1985-12-01

    It is shown that in five-dimensional Kaluza-Klein theories the large-mass problem leads to circulus vitiosus: the huge recent e2/G value produces the large mass problem, which restricts the ratio e2/Gm2 to the order of unity, in contradiction with the present 1040 value for elementary particles.

  10. Interval data clustering using self-organizing maps based on adaptive Mahalanobis distances.

    PubMed

    Hajjar, Chantal; Hamdan, Hani

    2013-10-01

    The self-organizing map is a kind of artificial neural network used to map high dimensional data into a low dimensional space. This paper presents a self-organizing map for interval-valued data based on adaptive Mahalanobis distances in order to do clustering of interval data with topology preservation. Two methods based on the batch training algorithm for the self-organizing maps are proposed. The first method uses a common Mahalanobis distance for all clusters. In the second method, the algorithm starts with a common Mahalanobis distance per cluster and then switches to use a different distance per cluster. This process allows a more adapted clustering for the given data set. The performances of the proposed methods are compared and discussed using artificial and real interval data sets. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Somatotyping using 3D anthropometry: a cluster analysis.

    PubMed

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  12. Universal dynamical properties preclude standard clustering in a large class of biochemical data.

    PubMed

    Gomez, Florian; Stoop, Ralph L; Stoop, Ruedi

    2014-09-01

    Clustering of chemical and biochemical data based on observed features is a central cognitive step in the analysis of chemical substances, in particular in combinatorial chemistry, or of complex biochemical reaction networks. Often, for reasons unknown to the researcher, this step produces disappointing results. Once the sources of the problem are known, improved clustering methods might revitalize the statistical approach of compound and reaction search and analysis. Here, we present a generic mechanism that may be at the origin of many clustering difficulties. The variety of dynamical behaviors that can be exhibited by complex biochemical reactions on variation of the system parameters are fundamental system fingerprints. In parameter space, shrimp-like or swallow-tail structures separate parameter sets that lead to stable periodic dynamical behavior from those leading to irregular behavior. We work out the genericity of this phenomenon and demonstrate novel examples for their occurrence in realistic models of biophysics. Although we elucidate the phenomenon by considering the emergence of periodicity in dependence on system parameters in a low-dimensional parameter space, the conclusions from our simple setting are shown to continue to be valid for features in a higher-dimensional feature space, as long as the feature-generating mechanism is not too extreme and the dimension of this space is not too high compared with the amount of available data. For online versions of super-paramagnetic clustering see http://stoop.ini.uzh.ch/research/clustering. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Cluster redshifts in five suspected superclusters

    NASA Technical Reports Server (NTRS)

    Ciardullo, R.; Ford, H.; Harms, R.

    1985-01-01

    Redshift surveys for rich superclusters were carried out in five regions of the sky containing surface-density enhancements of Abell clusters. While several superclusters are identified, projection effects dominate each field, and no system contains more than five rich clusters. Two systems are found to be especially interesting. The first, field 0136 10, is shown to contain a superposition of at least four distinct superclusters, with the richest system possessing a small velocity dispersion. The second system, 2206 - 22, though a region of exceedingly high Abell cluster surface density, appears to be a remarkable superposition of 23 rich clusters almost uniformly distributed in redshift space between 0.08 and 0.24. The new redshifts significantly increase the three-dimensional information available for the distance class 5 and 6 Abell clusters and allow the spatial correlation function around rich superclusters to be estimated.

  14. Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

    PubMed

    Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken

    2014-03-01

    We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer

  15. Reversible Electrochemical Lithium-Ion Insertion into the Rhenium Cluster Chalcogenide-Halide Re6Se8Cl2.

    PubMed

    Bruck, Andrea M; Yin, Jiefu; Tong, Xiao; Takeuchi, Esther S; Takeuchi, Kenneth J; Szczepura, Lisa F; Marschilok, Amy C

    2018-05-07

    The cluster-based material Re 6 Se 8 Cl 2 is a two-dimensional ternary material with cluster-cluster bonding across the a and b axes capable of multiple electron transfer accompanied by ion insertion across the c axis. The Li/Re 6 Se 8 Cl 2 system showed reversible electron transfer from 1 to 3 electron equivalents (ee) at high current densities (88 mA/g). Upon cycling to 4 ee, there was evidence of capacity degradation over 50 cycles associated with the formation of an organic solid-electrolyte interface (between 1.45 and 1 V vs Li/Li + ). This investigation highlights the ability of cluster-based materials with two-dimensional cluster bonding to be used in applications such as energy storage, showing structural stability and high rate capability.

  16. Plurigon: three dimensional visualization and classification of high-dimensionality data

    PubMed Central

    Martin, Bronwen; Chen, Hongyu; Daimon, Caitlin M.; Chadwick, Wayne; Siddiqui, Sana; Maudsley, Stuart

    2013-01-01

    High-dimensionality data is rapidly becoming the norm for biomedical sciences and many other analytical disciplines. Not only is the collection and processing time for such data becoming problematic, but it has become increasingly difficult to form a comprehensive appreciation of high-dimensionality data. Though data analysis methods for coping with multivariate data are well-documented in technical fields such as computer science, little effort is currently being expended to condense data vectors that exist beyond the realm of physical space into an easily interpretable and aesthetic form. To address this important need, we have developed Plurigon, a data visualization and classification tool for the integration of high-dimensionality visualization algorithms with a user-friendly, interactive graphical interface. Unlike existing data visualization methods, which are focused on an ensemble of data points, Plurigon places a strong emphasis upon the visualization of a single data point and its determining characteristics. Multivariate data vectors are represented in the form of a deformed sphere with a distinct topology of hills, valleys, plateaus, peaks, and crevices. The gestalt structure of the resultant Plurigon object generates an easily-appreciable model. User interaction with the Plurigon is extensive; zoom, rotation, axial and vector display, feature extraction, and anaglyph stereoscopy are currently supported. With Plurigon and its ability to analyze high-complexity data, we hope to see a unification of biomedical and computational sciences as well as practical applications in a wide array of scientific disciplines. Increased accessibility to the analysis of high-dimensionality data may increase the number of new discoveries and breakthroughs, ranging from drug screening to disease diagnosis to medical literature mining. PMID:23885241

  17. Experimental and theoretical investigation of three-dimensional nitrogen-doped aluminum clusters AI 8N - and AI 8N

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Leiming; Huang, Wei; Wang, Lai S.

    The structure and electronic properties of the Al 8N - and Al 8N clusters were investigated by combined photoelectron spectroscopy and ab initio studies. Congested photoelectron spectra were observed and experimental evidence was obtained for the presence of multiple isomers for Al 8N - Global minimum searches revealed several structures for Al 8N - with close energies. The calculated vertical detachment energies of the two lowest-lying isomers, which are of C 2v and C s symmetry, respectively, were shown to agree well with the experimental data. Unlike the three-dimensional structures of Al 6N - and Al 7N -, in whichmore » the dopant N atom has a high coordination number of 6,the dopant N atom in the two low-lying isomers of Al 8N - has a lower coordination number of 4 and 5, respectively. The competition between the Al–Al and Al–N interactions are shown to determine the global minimum structures of the doped aluminum clusters and results in the structural diversity for both Al 8N - and Al8N. © 2009 American Institute of Physics« less

  18. Two-dimensional unsteady lift problems in supersonic flight

    NASA Technical Reports Server (NTRS)

    Heaslet, Max A; Lomax, Harvard

    1949-01-01

    The variation of pressure distribution is calculated for a two-dimensional supersonic airfoil either experiencing a sudden angle-of-attack change or entering a sharp-edge gust. From these pressure distributions the indicial lift functions applicable to unsteady lift problems are determined for two cases. Results are presented which permit the determination of maximum increment in lift coefficient attained by an unrestrained airfoil during its flight through a gust. As an application of these results, the minimum altitude for safe flight through a specific gust is calculated for a particular supersonic wing of given strength and wing loading.

  19. One-dimensional high-order compact method for solving Euler's equations

    NASA Astrophysics Data System (ADS)

    Mohamad, M. A. H.; Basri, S.; Basuno, B.

    2012-06-01

    In the field of computational fluid dynamics, many numerical algorithms have been developed to simulate inviscid, compressible flows problems. Among those most famous and relevant are based on flux vector splitting and Godunov-type schemes. Previously, this system was developed through computational studies by Mawlood [1]. However the new test cases for compressible flows, the shock tube problems namely the receding flow and shock waves were not investigated before by Mawlood [1]. Thus, the objective of this study is to develop a high-order compact (HOC) finite difference solver for onedimensional Euler equation. Before developing the solver, a detailed investigation was conducted to assess the performance of the basic third-order compact central discretization schemes. Spatial discretization of the Euler equation is based on flux-vector splitting. From this observation, discretization of the convective flux terms of the Euler equation is based on a hybrid flux-vector splitting, known as the advection upstream splitting method (AUSM) scheme which combines the accuracy of flux-difference splitting and the robustness of flux-vector splitting. The AUSM scheme is based on the third-order compact scheme to the approximate finite difference equation was completely analyzed consequently. In one-dimensional problem for the first order schemes, an explicit method is adopted by using time integration method. In addition to that, development and modification of source code for the one-dimensional flow is validated with four test cases namely, unsteady shock tube, quasi-one-dimensional supersonic-subsonic nozzle flow, receding flow and shock waves in shock tubes. From these results, it was also carried out to ensure that the definition of Riemann problem can be identified. Further analysis had also been done in comparing the characteristic of AUSM scheme against experimental results, obtained from previous works and also comparative analysis with computational results

  20. Elitist Binary Wolf Search Algorithm for Heuristic Feature Selection in High-Dimensional Bioinformatics Datasets.

    PubMed

    Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L

    2017-06-28

    Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.

  1. Statistical mechanics of complex neural systems and high dimensional data

    NASA Astrophysics Data System (ADS)

    Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya

    2013-03-01

    Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.

  2. A Hyperspherical Adaptive Sparse-Grid Method for High-Dimensional Discontinuity Detection

    DOE PAGES

    Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max D.; ...

    2015-06-24

    This study proposes and analyzes a hyperspherical adaptive hierarchical sparse-grid method for detecting jump discontinuities of functions in high-dimensional spaces. The method is motivated by the theoretical and computational inefficiencies of well-known adaptive sparse-grid methods for discontinuity detection. Our novel approach constructs a function representation of the discontinuity hypersurface of an N-dimensional discontinuous quantity of interest, by virtue of a hyperspherical transformation. Then, a sparse-grid approximation of the transformed function is built in the hyperspherical coordinate system, whose value at each point is estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of the hypersurface, the newmore » technique can identify jump discontinuities with significantly reduced computational cost, compared to existing methods. In addition, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. Rigorous complexity analyses of the new method are provided as are several numerical examples that illustrate the effectiveness of the approach.« less

  3. Cluster analysis of bone microarchitecture from high resolution peripheral quantitative computed tomography demonstrates two separate phenotypes associated with high fracture risk in men and women.

    PubMed

    Edwards, M H; Robinson, D E; Ward, K A; Javaid, M K; Walker-Bone, K; Cooper, C; Dennison, E M

    2016-07-01

    Osteoporosis is a major healthcare problem which is conventionally assessed by dual energy X-ray absorptiometry (DXA). New technologies such as high resolution peripheral quantitative computed tomography (HRpQCT) also predict fracture risk. HRpQCT measures a number of bone characteristics that may inform specific patterns of bone deficits. We used cluster analysis to define different bone phenotypes and their relationships to fracture prevalence and areal bone mineral density (BMD). 177 men and 159 women, in whom fracture history was determined by self-report and vertebral fracture assessment, underwent HRpQCT of the distal radius and femoral neck DXA. Five clusters were derived with two clusters associated with elevated fracture risk. "Cluster 1" contained 26 women (50.0% fractured) and 30 men (50.0% fractured) with a lower mean cortical thickness and cortical volumetric BMD, and in men only, a mean total and trabecular area more than the sex-specific cohort mean. "Cluster 2" contained 20 women (50.0% fractured) and 14 men (35.7% fractured) with a lower mean trabecular density and trabecular number than the sex-specific cohort mean. Logistic regression showed fracture rates in these clusters to be significantly higher than the lowest fracture risk cluster [5] (p<0.05). Mean femoral neck areal BMD was significantly lower than cluster 5 in women in cluster 1 and 2 (p<0.001 for both), and in men, in cluster 2 (p<0.001) but not 1 (p=0.220). In conclusion, this study demonstrates two distinct high risk clusters in both men and women which may differ in etiology and response to treatment. As cluster 1 in men does not have low areal BMD, these men may not be identified as high risk by conventional DXA alone. Copyright © 2016. Published by Elsevier Inc.

  4. Galaxy Merger Candidates in High-redshift Cluster Environments

    NASA Astrophysics Data System (ADS)

    Delahaye, A. G.; Webb, T. M. A.; Nantais, J.; DeGroot, A.; Wilson, G.; Muzzin, A.; Yee, H. K. C.; Foltz, R.; Noble, A. G.; Demarco, R.; Tudorica, A.; Cooper, M. C.; Lidman, C.; Perlmutter, S.; Hayden, B.; Boone, K.; Surace, J.

    2017-07-01

    We compile a sample of spectroscopically and photometrically selected cluster galaxies from four high-redshift galaxy clusters (1.59< z< 1.71) from the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS), and a comparison field sample selected from the UKIDSS Deep Survey. Using near-infrared imaging from the Hubble Space Telescope, we classify potential mergers involving massive ({M}* ≥slant 3× {10}10 {M}⊙ ) cluster members by eye, based on morphological properties such as tidal distortions, double nuclei, and projected near neighbors within 20 kpc. With a catalog of 23 spectroscopic and 32 photometric massive cluster members across the four clusters and 65 spectroscopic and 26 photometric comparable field galaxies, we find that after taking into account contamination from interlopers, {11.0}-5.6+7.0 % of the cluster members are involved in potential mergers, compared to {24.7}-4.6+5.3 % of the field galaxies. We see no evidence of merger enhancement in the central cluster environment with respect to the field, suggesting that galaxy-galaxy merging is not a stronger source of galaxy evolution in cluster environments compared to the field at these redshifts.

  5. On the Unusually High Temperature of the Cluster of Galaxies 1E 0657-56

    NASA Technical Reports Server (NTRS)

    Yaqoob, Tahir

    1999-01-01

    A recent X-ray observation of the cluster 1E 0657-56 (z = 0.296) with ASC,4 implied an unusually high temperature of approx. 17 keV. Such a high temperature would make it the hottest known cluster and severely constrain cosmological models since, in a Universe with critical density (Omega = 1) the probability of observing such a cluster is only approx. 4 x 10(exp -5). Here we test the robustness of this observational result since it has such important implications. We analysed the data using a variety of different data analysis methods and spectral analysis assumptions and find a temperature of approx. 11 - 12 keV in all cases, except for one class of spectral fits. These are fits in which the absorbing column density is fixed at the Galactic value. Using simulated data for a 12 keV cluster, we show that a high temperature of approx. 17 keV is artificially obtained if the true spectrum has a stronger low-energy cut-off than that for Galactic absorption only. The apparent extra absorption may be astrophysical in origin, (either intrinsic or line-of-sight), or it may be a problem with the low-energy CCD efficiency. Although significantly lower than previous measurements, this temperature of kT approx. 11 - 12 keV is still relatively high since only a few clusters have been found to have temperatures higher than 10 keV and the data therefore still present some difficulty for an Omega = 1 Universe. Our results will also be useful to anyone who wants to estimate the systematic errors involved in different methods of background subtraction of ASCA data for sources with similar signal-to-noise to that of the IE 0657-56 data reported here.

  6. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  7. Enhanced, targeted sampling of high-dimensional free-energy landscapes using variationally enhanced sampling, with an application to chignolin

    PubMed Central

    Shaffer, Patrick; Valsson, Omar; Parrinello, Michele

    2016-01-01

    The capabilities of molecular simulations have been greatly extended by a number of widely used enhanced sampling methods that facilitate escaping from metastable states and crossing large barriers. Despite these developments there are still many problems which remain out of reach for these methods which has led to a vigorous effort in this area. One of the most important problems that remains unsolved is sampling high-dimensional free-energy landscapes and systems that are not easily described by a small number of collective variables. In this work we demonstrate a new way to compute free-energy landscapes of high dimensionality based on the previously introduced variationally enhanced sampling, and we apply it to the miniprotein chignolin. PMID:26787868

  8. Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks.

    PubMed

    Mulder, Samuel A; Wunsch, Donald C

    2003-01-01

    The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.

  9. A phase cell cluster expansion for Euclidean field theories

    NASA Astrophysics Data System (ADS)

    Battle, Guy A., III; Federbush, Paul

    1982-08-01

    We adapt the cluster expansion first used to treat infrared problems for lattice models (a mass zero cluster expansion) to the usual field theory situation. The field is expanded in terms of special block spin functions and the cluster expansion given in terms of the expansion coefficients (phase cell variables); the cluster expansion expresses correlation functions in terms of contributions from finite coupled subsets of these variables. Most of the present work is carried through in d space time dimensions (for φ24 the details of the cluster expansion are pursued and convergence is proven). Thus most of the results in the present work will apply to a treatment of φ34 to which we hope to return in a succeeding paper. Of particular interest in this paper is a substitute for the stability of the vacuum bound appropriate to this cluster expansion (for d = 2 and d = 3), and a new method for performing estimates with tree graphs. The phase cell cluster expansions have the renormalization group incorporated intimately into their structure. We hope they will be useful ultimately in treating four dimensional field theories.

  10. Unsupervised universal steganalyzer for high-dimensional steganalytic features

    NASA Astrophysics Data System (ADS)

    Hou, Xiaodan; Zhang, Tao

    2016-11-01

    The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.

  11. Joint principal trend analysis for longitudinal high-dimensional data.

    PubMed

    Zhang, Yuping; Ouyang, Zhengqing

    2018-06-01

    We consider a research scenario motivated by integrating multiple sources of information for better knowledge discovery in diverse dynamic biological processes. Given two longitudinal high-dimensional datasets for a group of subjects, we want to extract shared latent trends and identify relevant features. To solve this problem, we present a new statistical method named as joint principal trend analysis (JPTA). We demonstrate the utility of JPTA through simulations and applications to gene expression data of the mammalian cell cycle and longitudinal transcriptional profiling data in response to influenza viral infections. © 2017, The International Biometric Society.

  12. Clustering Millions of Faces by Identity.

    PubMed

    Otto, Charles; Wang, Dayong; Jain, Anil K

    2018-02-01

    Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.

  13. Online clustering algorithms for radar emitter classification.

    PubMed

    Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

    2005-08-01

    Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.

  14. B =5 Skyrmion as a two-cluster system

    NASA Astrophysics Data System (ADS)

    Gudnason, Sven Bjarke; Halcrow, Chris

    2018-06-01

    The classical B =5 Skyrmion can be approximated by a two-cluster system in which a B =1 Skyrmion is attached to a core B =4 Skyrmion. We quantize this system, allowing the B =1 to freely orbit the core. The configuration space is 11 dimensional but simplifies significantly after factoring out the overall spin and isospin degrees of freedom. We exactly solve the free quantum problem and then include an interaction potential between the Skyrmions numerically. The resulting energy spectrum is compared to the corresponding nuclei—the helium-5/lithium-5 isodoublet. We find approximate parity doubling not seen in the experimental data. In addition, we fail to obtain the correct ground-state spin. The framework laid out for this two-cluster system can readily be modified for other clusters and in particular for other B =4 n +1 nuclei, of which B =5 is the simplest example.

  15. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the

  16. Dark matter phenomenology of high-speed galaxy cluster collisions

    DOE PAGES

    Mishchenko, Yuriy; Ji, Chueng-Ryong

    2017-07-29

    Here, we perform a general computational analysis of possible post-collision mass distributions in high-speed galaxy cluster collisions in the presence of self-interacting dark matter. Using this analysis, we show that astrophysically weakly self-interacting dark matter can impart subtle yet measurable features in the mass distributions of colliding galaxy clusters even without significant disruptions to the dark matter halos of the colliding galaxy clusters themselves. Most profound such evidence is found to reside in the tails of dark matter halos’ distributions, in the space between the colliding galaxy clusters. Such features appear in our simulations as shells of scattered dark mattermore » expanding in alignment with the outgoing original galaxy clusters, contributing significant densities to projected mass distributions at large distances from collision centers and large scattering angles of up to 90°. Our simulations indicate that as much as 20% of the total collision’s mass may be deposited into such structures without noticeable disruptions to the main galaxy clusters. Such structures at large scattering angles are forbidden in purely gravitational high-speed galaxy cluster collisions.Convincing identification of such structures in real colliding galaxy clusters would be a clear indication of the self-interacting nature of dark matter. Our findings may offer an explanation for the ring-like dark matter feature recently identified in the long-range reconstructions of the mass distribution of the colliding galaxy cluster CL0024+017.« less

  17. Dark matter phenomenology of high-speed galaxy cluster collisions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mishchenko, Yuriy; Ji, Chueng-Ryong

    Here, we perform a general computational analysis of possible post-collision mass distributions in high-speed galaxy cluster collisions in the presence of self-interacting dark matter. Using this analysis, we show that astrophysically weakly self-interacting dark matter can impart subtle yet measurable features in the mass distributions of colliding galaxy clusters even without significant disruptions to the dark matter halos of the colliding galaxy clusters themselves. Most profound such evidence is found to reside in the tails of dark matter halos’ distributions, in the space between the colliding galaxy clusters. Such features appear in our simulations as shells of scattered dark mattermore » expanding in alignment with the outgoing original galaxy clusters, contributing significant densities to projected mass distributions at large distances from collision centers and large scattering angles of up to 90°. Our simulations indicate that as much as 20% of the total collision’s mass may be deposited into such structures without noticeable disruptions to the main galaxy clusters. Such structures at large scattering angles are forbidden in purely gravitational high-speed galaxy cluster collisions.Convincing identification of such structures in real colliding galaxy clusters would be a clear indication of the self-interacting nature of dark matter. Our findings may offer an explanation for the ring-like dark matter feature recently identified in the long-range reconstructions of the mass distribution of the colliding galaxy cluster CL0024+017.« less

  18. Clustering PPI data by combining FA and SHC method.

    PubMed

    Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin

    2015-01-01

    Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.

  19. Clustering PPI data by combining FA and SHC method

    PubMed Central

    2015-01-01

    Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632

  20. A clustering algorithm for determining community structure in complex networks

    NASA Astrophysics Data System (ADS)

    Jin, Hong; Yu, Wei; Li, ShiJun

    2018-02-01

    Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.

  1. {lambda} elements for one-dimensional singular problems with known strength of singularity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, K.K.; Surana, K.S.

    1996-10-01

    This paper presents a new and general procedure for designing special elements called {lambda} elements for one dimensional singular problems where the strength of the singularity is know. The {lambda} elements presented here are of type C{sup 0}. These elements also provide inter-element C{sup 0} continuity with p-version elements. The {lambda} elements do not require a precise knowledge of the extent of singular zone, i.e., their use may be extended beyond the singular zone. When {lambda} elements are used at the singularity, a singular problem behaves like a smooth problem thereby eliminating the need for h, p-adaptive processes all together.more » One dimensional steady state radial flow of an upper convected Maxwell fluid is considered as a sample problem. Least squares approach (or least squares finite element formulation: LSFEF) is used to construct the integral form (error functional I) from the differential equations. Numerical results presented for radially inward flow with inner radius r{sub i} = 0.1, 0.01, 0.001, 0.0001, 0.00001, and Deborah number of 2 (De = 2) demonstrate the accuracy, faster convergence of the iterative solution procedure, faster convergence rate of the error functional and mesh independent characteristics of the {lambda} elements regardless of the severity of the singularity.« less

  2. Formation and structure of stable aggregates in binary diffusion-limited cluster-cluster aggregation processes

    NASA Astrophysics Data System (ADS)

    López-López, J. M.; Moncho-Jordá, A.; Schmitt, A.; Hidalgo-Álvarez, R.

    2005-09-01

    Binary diffusion-limited cluster-cluster aggregation processes are studied as a function of the relative concentration of the two species. Both, short and long time behaviors are investigated by means of three-dimensional off-lattice Brownian Dynamics simulations. At short aggregation times, the validity of the Hogg-Healy-Fuerstenau approximation is shown. At long times, a single large cluster containing all initial particles is found to be formed when the relative concentration of the minority particles lies above a critical value. Below that value, stable aggregates remain in the system. These stable aggregates are composed by a few minority particles that are highly covered by majority ones. Our off-lattice simulations reveal a value of approximately 0.15 for the critical relative concentration. A qualitative explanation scheme for the formation and growth of the stable aggregates is developed. The simulations also explain the phenomenon of monomer discrimination that was observed recently in single cluster light scattering experiments.

  3. Automatic reconstruction of fault networks from seismicity catalogs: Three-dimensional optimal anisotropic dynamic clustering

    NASA Astrophysics Data System (ADS)

    Ouillon, G.; Ducorbier, C.; Sornette, D.

    2008-01-01

    We propose a new pattern recognition method that is able to reconstruct the three-dimensional structure of the active part of a fault network using the spatial location of earthquakes. The method is a generalization of the so-called dynamic clustering (or k means) method, that partitions a set of data points into clusters, using a global minimization criterion of the variance of the hypocenters locations about their center of mass. The new method improves on the original k means method by taking into account the full spatial covariance tensor of each cluster in order to partition the data set into fault-like, anisotropic clusters. Given a catalog of seismic events, the output is the optimal set of plane segments that fits the spatial structure of the data. Each plane segment is fully characterized by its location, size, and orientation. The main tunable parameter is the accuracy of the earthquake locations, which fixes the resolution, i.e., the residual variance of the fit. The resolution determines the number of fault segments needed to describe the earthquake catalog: the better the resolution, the finer the structure of the reconstructed fault segments. The algorithm successfully reconstructs the fault segments of synthetic earthquake catalogs. Applied to the real catalog constituted of a subset of the aftershock sequence of the 28 June 1992 Landers earthquake in southern California, the reconstructed plane segments fully agree with faults already known on geological maps or with blind faults that appear quite obvious in longer-term catalogs. Future improvements of the method are discussed, as well as its potential use in the multiscale study of the inner structure of fault zones.

  4. Resolvent approach for two-dimensional scattering problems. Application to the nonstationary Schrödinger problem and the KPI equation

    NASA Astrophysics Data System (ADS)

    Boiti, M.; Pempinelli, F.; Pogrebkov, A. K.; Polivanov, M. C.

    1992-11-01

    The resolvent operator of the linear problem is determined as the full Green function continued in the complex domain in two variables. An analog of the known Hilbert identity is derived. We demonstrate the role of this identity in the study of two-dimensional scattering. Considering the nonstationary Schrödinger equation as an example, we show that all types of solutions of the linear problems, as well as spectral data known in the literature, are given as specific values of this unique function — the resolvent function. A new form of the inverse problem is formulated.

  5. High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions

    DTIC Science & Technology

    2016-08-30

    High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated

  6. A latent modeling approach to genotype-phenotype relationships: maternal problem behavior clusters, prenatal smoking, and MAOA genotype.

    PubMed

    McGrath, L M; Mustanski, B; Metzger, A; Pine, D S; Kistner-Griffin, E; Cook, E; Wakschlag, L S

    2012-08-01

    This study illustrates the application of a latent modeling approach to genotype-phenotype relationships and gene × environment interactions, using a novel, multidimensional model of adult female problem behavior, including maternal prenatal smoking. The gene of interest is the monoamine oxidase A (MAOA) gene which has been well studied in relation to antisocial behavior. Participants were adult women (N = 192) who were sampled from a prospective pregnancy cohort of non-Hispanic, white individuals recruited from a neighborhood health clinic. Structural equation modeling was used to model a female problem behavior phenotype, which included conduct problems, substance use, impulsive-sensation seeking, interpersonal aggression, and prenatal smoking. All of the female problem behavior dimensions clustered together strongly, with the exception of prenatal smoking. A main effect of MAOA genotype and a MAOA × physical maltreatment interaction were detected with the Conduct Problems factor. Our phenotypic model showed that prenatal smoking is not simply a marker of other maternal problem behaviors. The risk variant in the MAOA main effect and interaction analyses was the high activity MAOA genotype, which is discrepant from consensus findings in male samples. This result contributes to an emerging literature on sex-specific interaction effects for MAOA.

  7. Self consistency grouping: a stringent clustering method

    PubMed Central

    2012-01-01

    Background Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Methods Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Results Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. Conclusions SCG has potential for finding biological relationships under stringent conditions. PMID:23320864

  8. Modeling change from large-scale high-dimensional spatio-temporal array data

    NASA Astrophysics Data System (ADS)

    Lu, Meng; Pebesma, Edzer

    2014-05-01

    The massive data that come from Earth observation satellite and other sensors provide significant information for modeling global change. At the same time, the high dimensionality of the data has brought challenges in data acquisition, management, effective querying and processing. In addition, the output of earth system modeling tends to be data intensive and needs methodologies for storing, validation, analyzing and visualization, e.g. as maps. An important proportion of earth system observations and simulated data can be represented as multi-dimensional array data, which has received increasingly attention in big data management and spatial-temporal analysis. Study cases will be developed in natural science such as climate change, hydrological modeling, sediment dynamics, from which the addressing of big data problems is necessary. Multi-dimensional array-based database management and analytics system such as Rasdaman, SciDB, and R will be applied to these cases. From these studies will hope to learn the strengths and weaknesses of these systems, how they might work together or how semantics of array operations differ, through addressing the problems associated with big data. Research questions include: • How can we reduce dimensions spatially and temporally, or thematically? • How can we extend existing GIS functions to work on multidimensional arrays? • How can we combine data sets of different dimensionality or different resolutions? • Can map algebra be extended to an intelligible array algebra? • What are effective semantics for array programming of dynamic data driven applications? • In which sense are space and time special, as dimensions, compared to other properties? • How can we make the analysis of multi-spectral, multi-temporal and multi-sensor earth observation data easy?

  9. Inorganic material profiling using Arn+ cluster: Can we achieve high quality profiles?

    NASA Astrophysics Data System (ADS)

    Conard, T.; Fleischmann, C.; Havelund, R.; Franquet, A.; Poleunis, C.; Delcorte, A.; Vandervorst, W.

    2018-06-01

    Retrieving molecular information by sputtering of organic systems has been concretized in the last years due to the introduction of sputtering by large gas clusters which drastically eliminated the compound degradation during the analysis and has led to strong improvements in depth resolution. Rapidly however, a limitation was observed for heterogeneous systems where inorganic layers or structures needed to be profiled concurrently. As opposed to organic material, erosion of the inorganic layer appears very difficult and prone to many artefacts. To shed some light on these problems we investigated a simple system consisting of aluminum delta layer(s) buried in a silicon matrix in order to define the most favorable beam conditions for practical analysis. We show that counterintuitive to the small energy/atom used and unlike monoatomic ion sputtering, the information depth obtained with large cluster ions is typically very large (∼10 nm) and that this can be caused both by a large roughness development at early stages of the sputtering process and by a large mixing zone. As a consequence, a large deformation of the Al intensity profile is observed. Using sample rotation during profiling significantly improves the depth resolution while sample temperature has no significant effect. The determining parameter for high depth resolution still remains the total energy of the cluster instead of the energy per atom in the cluster.

  10. GATE: software for the analysis and visualization of high-dimensional time series expression data.

    PubMed

    MacArthur, Ben D; Lachmann, Alexander; Lemischka, Ihor R; Ma'ayan, Avi

    2010-01-01

    We present Grid Analysis of Time series Expression (GATE), an integrated computational software platform for the analysis and visualization of high-dimensional biomolecular time series. GATE uses a correlation-based clustering algorithm to arrange molecular time series on a two-dimensional hexagonal array and dynamically colors individual hexagons according to the expression level of the molecular component to which they are assigned, to create animated movies of systems-level molecular regulatory dynamics. In order to infer potential regulatory control mechanisms from patterns of correlation, GATE also allows interactive interroga-tion of movies against a wide variety of prior knowledge datasets. GATE movies can be paused and are interactive, allowing users to reconstruct networks and perform functional enrichment analyses. Movies created with GATE can be saved in Flash format and can be inserted directly into PDF manuscript files as interactive figures. GATE is available for download and is free for academic use from http://amp.pharm.mssm.edu/maayan-lab/gate.htm

  11. Structures of undecagold clusters: Ligand effect

    NASA Astrophysics Data System (ADS)

    Spivey, Kasi; Williams, Joseph I.; Wang, Lichang

    2006-12-01

    The most stable structure of undecagold, or Au 11, clusters was predicted from our DFT calculations to be planar [L. Xiao, L. Wang, Chem. Phys. Lett. 392 (2004) 452; L. Xiao, B. Tollberg, X. Hu, L. Wang, J. Chem. Phys. 124 (2005) 114309.]. The structures of ligand protected undecagold clusters were shown to be three-dimensional experimentally. In this work, we used DFT calculations to study the ligand effect on the structures of Au 11 clusters. Our results show that the most stable structure of Au 11 is in fact three-dimensional when SCH 3 ligands are attached. This indicates that the structures of small gold clusters are altered substantially in the presence of ligands.

  12. Effective one-dimensional approach to the source reconstruction problem of three-dimensional inverse optoacoustics

    NASA Astrophysics Data System (ADS)

    Stritzel, J.; Melchert, O.; Wollweber, M.; Roth, B.

    2017-09-01

    The direct problem of optoacoustic signal generation in biological media consists of solving an inhomogeneous three-dimensional (3D) wave equation for an initial acoustic stress profile. In contrast, the more defiant inverse problem requires the reconstruction of the initial stress profile from a proper set of observed signals. In this article, we consider an effectively 1D approach, based on the assumption of a Gaussian transverse irradiation source profile and plane acoustic waves, in which the effects of acoustic diffraction are described in terms of a linear integral equation. The respective inverse problem along the beam axis can be cast into a Volterra integral equation of the second kind for which we explore here efficient numerical schemes in order to reconstruct initial stress profiles from observed signals, constituting a methodical progress of computational aspects of optoacoustics. In this regard, we explore the validity as well as the limits of the inversion scheme via numerical experiments, with parameters geared toward actual optoacoustic problem instances. The considered inversion input consists of synthetic data, obtained in terms of the effectively 1D approach, and, more generally, a solution of the 3D optoacoustic wave equation. Finally, we also analyze the effect of noise and different detector-to-sample distances on the optoacoustic signal and the reconstructed pressure profiles.

  13. Eigenvalue problems for Beltrami fields arising in a three-dimensional toroidal magnetohydrodynamic equilibrium problem

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hudson, S. R.; Hole, M. J.; Dewar, R. L.

    2007-05-15

    A generalized energy principle for finite-pressure, toroidal magnetohydrodynamic (MHD) equilibria in general three-dimensional configurations is proposed. The full set of ideal-MHD constraints is applied only on a discrete set of toroidal magnetic surfaces (invariant tori), which act as barriers against leakage of magnetic flux, helicity, and pressure through chaotic field-line transport. It is argued that a necessary condition for such invariant tori to exist is that they have fixed, irrational rotational transforms. In the toroidal domains bounded by these surfaces, full Taylor relaxation is assumed, thus leading to Beltrami fields {nabla}xB={lambda}B, where {lambda} is constant within each domain. Two distinctmore » eigenvalue problems for {lambda} arise in this formulation, depending on whether fluxes and helicity are fixed, or boundary rotational transforms. These are studied in cylindrical geometry and in a three-dimensional toroidal region of annular cross section. In the latter case, an application of a residue criterion is used to determine the threshold for connected chaos.« less

  14. Parallel Simulation of Three-Dimensional Free-Surface Fluid Flow Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    BAER,THOMAS A.; SUBIA,SAMUEL R.; SACKINGER,PHILIP A.

    2000-01-18

    We describe parallel simulations of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact lines. The Galerlin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of problem unknowns. Issues concerning the proper constraints along the solid-fluid dynamic contact line inmore » three dimensions are discussed. Parallel computations are carried out for an example taken from the coating flow industry, flow in the vicinity of a slot coater edge. This is a three-dimensional free-surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another part of the flow domain. Discussion focuses on parallel speedups for fixed problem size, a class of problems of immediate practical importance.« less

  15. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*

    PubMed Central

    Cai, T. Tony; Zhang, Anru

    2016-01-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471

  16. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.

    PubMed

    Cai, T Tony; Zhang, Anru

    2016-09-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.

  17. Interactions of small platinum clusters with the TiC(001) surface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mao, Jianjun; Li, Shasha; Chu, Xingli

    2015-11-14

    Density functional theory calculations are used to elucidate the interactions of small platinum clusters (Pt{sub n}, n = 1–5) with the TiC(001) surface. The results are analyzed in terms of geometric, energetic, and electronic properties. It is found that a single Pt atom prefers to be adsorbed at the C-top site, while a Pt{sub 2} cluster prefers dimerization and a Pt{sub 3} cluster forms a linear structure on the TiC(001). As for the Pt{sub 4} cluster, the three-dimensional distorted tetrahedral structure and the two-dimensional square structure almost have equal stability. In contrast with the two-dimensional isolated Pt{sub 5} cluster, the adsorbed Pt{submore » 5} cluster prefers a three-dimensional structure on TiC(001). Substantial charge transfer takes place from TiC(001) surface to the adsorbed Pt{sub n} clusters, resulting in the negatively charged Pt{sub n} clusters. At last, the d-band centers of the absorbed Pt atoms and their implications in the catalytic activity are discussed.« less

  18. Swarm Intelligence in Text Document Clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Xiaohui; Potok, Thomas E

    2008-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less

  19. SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering.

    PubMed

    Cao, Jie; Wu, Zhiang; Wu, Junjie; Xiong, Hui

    2013-04-01

    Information-theoretic clustering aims to exploit information-theoretic measures as the clustering criteria. A common practice on this topic is the so-called Info-Kmeans, which performs K-means clustering with KL-divergence as the proximity function. While expert efforts on Info-Kmeans have shown promising results, a remaining challenge is to deal with high-dimensional sparse data such as text corpora. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional text vectors, which leads to infinite KL-divergence values and creates a dilemma in assigning objects to centroids during the iteration process of Info-Kmeans. To meet this challenge, in this paper, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of KL-divergence by the incremental computation of Shannon entropy. This can avoid the zero-feature dilemma caused by the use of KL-divergence. To improve the clustering quality, we further introduce the variable neighborhood search scheme and propose the V-SAIL algorithm, which is then accelerated by a multithreaded scheme in PV-SAIL. Our experimental results on various real-world text collections have shown that, with SAIL as a booster, the clustering performance of Info-Kmeans can be significantly improved. Also, V-SAIL and PV-SAIL indeed help improve the clustering quality at a lower cost of computation.

  20. Implementation of a computationally efficient least-squares algorithm for highly under-determined three-dimensional diffuse optical tomography problems.

    PubMed

    Yalavarthy, Phaneendra K; Lynch, Daniel R; Pogue, Brian W; Dehghani, Hamid; Paulsen, Keith D

    2008-05-01

    Three-dimensional (3D) diffuse optical tomography is known to be a nonlinear, ill-posed and sometimes under-determined problem, where regularization is added to the minimization to allow convergence to a unique solution. In this work, a generalized least-squares (GLS) minimization method was implemented, which employs weight matrices for both data-model misfit and optical properties to include their variances and covariances, using a computationally efficient scheme. This allows inversion of a matrix that is of a dimension dictated by the number of measurements, instead of by the number of imaging parameters. This increases the computation speed up to four times per iteration in most of the under-determined 3D imaging problems. An analytic derivation, using the Sherman-Morrison-Woodbury identity, is shown for this efficient alternative form and it is proven to be equivalent, not only analytically, but also numerically. Equivalent alternative forms for other minimization methods, like Levenberg-Marquardt (LM) and Tikhonov, are also derived. Three-dimensional reconstruction results indicate that the poor recovery of quantitatively accurate values in 3D optical images can also be a characteristic of the reconstruction algorithm, along with the target size. Interestingly, usage of GLS reconstruction methods reduces error in the periphery of the image, as expected, and improves by 20% the ability to quantify local interior regions in terms of the recovered optical contrast, as compared to LM methods. Characterization of detector photo-multiplier tubes noise has enabled the use of the GLS method for reconstructing experimental data and showed a promise for better quantification of target in 3D optical imaging. Use of these new alternative forms becomes effective when the ratio of the number of imaging property parameters exceeds the number of measurements by a factor greater than 2.

  1. Robust MST-Based Clustering Algorithm.

    PubMed

    Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

    2018-06-01

    Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

  2. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

    PubMed

    Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

    2015-12-01

    One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .

  3. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.

    For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less

  4. NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes

    NASA Astrophysics Data System (ADS)

    Caplan, R. M.

    2013-04-01

    We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time

  5. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  6. Hyperspherical Sparse Approximation Techniques for High-Dimensional Discontinuity Detection

    DOE PAGES

    Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max; ...

    2016-08-04

    This work proposes a hyperspherical sparse approximation framework for detecting jump discontinuities in functions in high-dimensional spaces. The need for a novel approach results from the theoretical and computational inefficiencies of well-known approaches, such as adaptive sparse grids, for discontinuity detection. Our approach constructs the hyperspherical coordinate representation of the discontinuity surface of a function. Then sparse approximations of the transformed function are built in the hyperspherical coordinate system, with values at each point estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of the hypersurface, the new technique can identify jump discontinuities with significantly reduced computationalmore » cost, compared to existing methods. Several approaches are used to approximate the transformed discontinuity surface in the hyperspherical system, including adaptive sparse grid and radial basis function interpolation, discrete least squares projection, and compressed sensing approximation. Moreover, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. In conclusion, rigorous complexity analyses of the new methods are provided, as are several numerical examples that illustrate the effectiveness of our approach.« less

  7. Clustering of High-Redshift Quasars

    NASA Astrophysics Data System (ADS)

    Timlin, John D., III

    In this work, we investigate the clustering of faint quasars in the early Universe and use the clustering strength to gain a better understanding of quasar feedback mechanisms and the growth of central supermassive black holes at early times in the history of the Universe. It has long been understood (e.g., Hopkins et al. 2007a) that the clustering of distant quasars can be used as a probe of different feedback models; however, until now, there was no sample of faint, high-redshift quasars with sufficient density to accurately measure the clustering strength. Therefore we conducted a new survey to increase the number density of these objects. Here, we describe the Spitzer -IRAC Equatorial Survey (SpIES) which is a moderately deep, large-area Spitzer survey which was designed to discover faint, high-redshift (2.9 ≤ z ≤ 5.1) quasars. SpIES spans 115 deg 2 in the equatorial "Stripe 82" region of the Sloan Digital Sky Survey (SDSS) and probes to 5sigma depths of 6.13 microJy (21.93 AB magnitude) and 5.75 microJy (22.0 AB magnitude) at 3.6 and 4.5 microns. At these depths, SpIES is able to observe faint quasars, and we show that SpIES recovers 94% of the high-redshift (z ≥ 3.5), spectroscopically-confirmed quasars that lie within its footprint. SpIES is also ideally located on Stripe 82 for two reasons: It surrounds existing infrared data from the Spitzer-HETDEX Exploratory Large-area (SHELA) survey which increases the area of infrared coverage, and there is a wide range of multi-wavelength, multi-epoch ancillary data on Stripe 82 which we can use together to select high-redshift quasar candidates. To photometrically identify quasar candidates, we combined the optical data from the Sloan Digital Sky Survey and the infrared data from SpIES and SHELA and employed three machine learning algorithms. These algorithms were trained on the optical/infrared colors of known, high-redshift quasars. Using this method, we generate a sample of 1378 objects that are both faint

  8. Boundary shape identification problems in two-dimensional domains related to thermal testing of materials

    NASA Technical Reports Server (NTRS)

    Banks, H. T.; Kojima, Fumio

    1988-01-01

    The identification of the geometrical structure of the system boundary for a two-dimensional diffusion system is reported. The domain identification problem treated here is converted into an optimization problem based on a fit-to-data criterion and theoretical convergence results for approximate identification techniques are discussed. Results of numerical experiments to demonstrate the efficacy of the theoretical ideas are reported.

  9. Harnessing high-dimensional hyperentanglement through a biphoton frequency comb

    NASA Astrophysics Data System (ADS)

    Xie, Zhenda; Zhong, Tian; Shrestha, Sajan; Xu, Xinan; Liang, Junlin; Gong, Yan-Xiao; Bienfang, Joshua C.; Restelli, Alessandro; Shapiro, Jeffrey H.; Wong, Franco N. C.; Wei Wong, Chee

    2015-08-01

    Quantum entanglement is a fundamental resource for secure information processing and communications, and hyperentanglement or high-dimensional entanglement has been separately proposed for its high data capacity and error resilience. The continuous-variable nature of the energy-time entanglement makes it an ideal candidate for efficient high-dimensional coding with minimal limitations. Here, we demonstrate the first simultaneous high-dimensional hyperentanglement using a biphoton frequency comb to harness the full potential in both the energy and time domain. Long-postulated Hong-Ou-Mandel quantum revival is exhibited, with up to 19 time-bins and 96.5% visibilities. We further witness the high-dimensional energy-time entanglement through Franson revivals, observed periodically at integer time-bins, with 97.8% visibility. This qudit state is observed to simultaneously violate the generalized Bell inequality by up to 10.95 standard deviations while observing recurrent Clauser-Horne-Shimony-Holt S-parameters up to 2.76. Our biphoton frequency comb provides a platform for photon-efficient quantum communications towards the ultimate channel capacity through energy-time-polarization high-dimensional encoding.

  10. High β effects on cosmic ray streaming in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Wiener, Joshua; Zweibel, Ellen G.; Oh, S. Peng

    2018-01-01

    Diffuse, extended radio emission in galaxy clusters, commonly referred to as radio haloes, indicate the presence of high energy cosmic ray (CR) electrons and cluster-wide magnetic fields. We can predict from theory the expected surface brightness of a radio halo, given magnetic field and CR density profiles. Previous studies have shown that the nature of CR transport can radically effect the expected radio halo emission from clusters (Wiener, Oh & Guo 2013). Reasonable levels of magnetohydrodynamic (MHD) wave damping can lead to significant CR streaming speeds. But a careful treatment of MHD waves in a high β plasma, as expected in cluster environments, reveals damping rates may be enhanced by a factor of β1/2. This leads to faster CR streaming and lower surface brightnesses than without this effect. In this work, we re-examine the simplified, 1D Coma cluster simulations (with radial magnetic fields) of Wiener et al. (2013) and discuss observable consequences of this high β damping. Future work is required to study this effect in more realistic simulations.

  11. Extending Beowulf Clusters

    USGS Publications Warehouse

    Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Hamer, George

    2003-01-01

    Beowulf clusters can provide a cost-effective way to compute numerical models and process large amounts of remote sensing image data. Usually a Beowulf cluster is designed to accomplish a specific set of processing goals, and processing is very efficient when the problem remains inside the constraints of the original design. There are cases, however, when one might wish to compute a problem that is beyond the capacity of the local Beowulf system. In these cases, spreading the problem to multiple clusters or to other machines on the network may provide a cost-effective solution.

  12. Self-organizing neural networks--an alternative way of cluster analysis in clinical chemistry.

    PubMed

    Reibnegger, G; Wachter, H

    1996-04-15

    Supervised learning schemes have been employed by several workers for training neural networks designed to solve clinical problems. We demonstrate that unsupervised techniques can also produce interesting and meaningful results. Using a data set on the chemical composition of milk from 22 different mammals, we demonstrate that self-organizing feature maps (Kohonen networks) as well as a modified version of error backpropagation technique yield results mimicking conventional cluster analysis. Both techniques are able to project a potentially multi-dimensional input vector onto a two-dimensional space whereby neighborhood relationships remain conserved. Thus, these techniques can be used for reducing dimensionality of complicated data sets and for enhancing comprehensibility of features hidden in the data matrix.

  13. Three-dimensional printing in pharmaceutics: promises and problems.

    PubMed

    Yu, Deng Guang; Zhu, Li-Min; Branford-White, Christopher J; Yang, Xiang Liang

    2008-09-01

    Three-dimensional printing (3DP) is a rapid prototyping (RP) technology. Prototyping involves constructing specific layers that uses powder processing and liquid binding materials. Reports in the literature have highlighted the many advantages of the 3DP system over other processes in enhancing pharmaceutical applications, these include new methods in design, development, manufacture, and commercialization of various types of solid dosage forms. For example, 3DP technology is flexible in that it can be used in applications linked to linear drug delivery systems (DDS), colon-targeted DDS, oral fast disintegrating DDS, floating DDS, time controlled, and pulse release DDS as well as dosage form with multiphase release properties and implantable DDS. In addition 3DP can also provide solutions for resolving difficulties relating to the delivery of poorly water-soluble drugs, peptides and proteins, preparation of DDS for high toxic and potent drugs and controlled-release of multidrugs in a single dosage forms. Due to its flexible and highly reproducible manufacturing process, 3DP has some advantages over conventional compressing and other RP technologies in fabricating solid DDS. This enables 3DP to be further developed for use in pharmaceutics applications. However, there are some problems that limit the further applications of the system, such as the selections of suitable excipients and the pharmacotechnical properties of 3DP products. Further developments are therefore needed to overcome these issues where 3DP systems can be successfully combined with conventional pharmaceutics. Here we present an overview and the potential 3DP in the development of new drug delivery systems.

  14. Interfacial energetics of two-dimensional colloidal clusters generated with a tunable anharmonic interaction potential

    NASA Astrophysics Data System (ADS)

    Hilou, Elaa; Du, Di; Kuei, Steve; Biswal, Sibani Lisa

    2018-02-01

    Interfacial characteristics are critical to various properties of two-dimensional (2D) materials such as band alignment at a heterojunction and nucleation kinetics in a 2D crystal. Despite the desire to harness these enhanced interfacial properties for engineering new materials, unexpected phase transitions and defects, unique to the 2D morphology, have left a number of open questions. In particular, the effects of configurational anisotropy, which are difficult to isolate experimentally, and their influence on interfacial properties are not well understood. In this work, we begin to probe this structure-thermodynamic relationship, using a rotating magnetic field to generate an anharmonic interaction potential in a 2D system of paramagnetic particles. At low magnetic field strengths, weakly interacting colloidal particles form non-close-packed, fluidlike droplets, whereas, at higher field strengths, crystallites with hexagonal ordering are observed. We examine spatial and interfacial properties of these 2D colloidal clusters by measuring the local bond orientation order parameter and interfacial stiffness as a function of the interaction strength. To our knowledge, this is the first study to measure the tunable interfacial stiffness of a 2D colloidal cluster by controlling particle interactions using external fields.

  15. A new extrapolation cascadic multigrid method for three dimensional elliptic boundary value problems

    NASA Astrophysics Data System (ADS)

    Pan, Kejia; He, Dongdong; Hu, Hongling; Ren, Zhengyong

    2017-09-01

    In this paper, we develop a new extrapolation cascadic multigrid method, which makes it possible to solve three dimensional elliptic boundary value problems with over 100 million unknowns on a desktop computer in half a minute. First, by combining Richardson extrapolation and quadratic finite element (FE) interpolation for the numerical solutions on two-level of grids (current and previous grids), we provide a quite good initial guess for the iterative solution on the next finer grid, which is a third-order approximation to the FE solution. And the resulting large linear system from the FE discretization is then solved by the Jacobi-preconditioned conjugate gradient (JCG) method with the obtained initial guess. Additionally, instead of performing a fixed number of iterations as used in existing cascadic multigrid methods, a relative residual tolerance is introduced in the JCG solver, which enables us to obtain conveniently the numerical solution with the desired accuracy. Moreover, a simple method based on the midpoint extrapolation formula is proposed to achieve higher-order accuracy on the finest grid cheaply and directly. Test results from four examples including two smooth problems with both constant and variable coefficients, an H3-regular problem as well as an anisotropic problem are reported to show that the proposed method has much better efficiency compared to the classical V-cycle and W-cycle multigrid methods. Finally, we present the reason why our method is highly efficient for solving these elliptic problems.

  16. A cluster analysis investigation of workaholism as a syndrome.

    PubMed

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  17. A latent modeling approach to genotype–phenotype relationships: maternal problem behavior clusters, prenatal smoking, and MAOA genotype

    PubMed Central

    Mustanski, B.; Metzger, A.; Pine, D. S.; Kistner-Griffin, E.; Cook, E.; Wakschlag, L. S.

    2013-01-01

    This study illustrates the application of a latent modeling approach to genotype–phenotype relationships and gene×environment interactions, using a novel, multidimensional model of adult female problem behavior, including maternal prenatal smoking. The gene of interest is the mono-amine oxidase A (MAOA) gene which has been well studied in relation to antisocial behavior. Participants were adult women (N=192) who were sampled from a prospective pregnancy cohort of non-Hispanic, white individuals recruited from a neighborhood health clinic. Structural equation modeling was used to model a female problem behavior phenotype, which included conduct problems, substance use, impulsive-sensation seeking, interpersonal aggression, and prenatal smoking. All of the female problem behavior dimensions clustered together strongly, with the exception of prenatal smoking. A main effect of MAOA genotype and a MAOA× physical maltreatment interaction were detected with the Conduct Problems factor. Our phenotypic model showed that prenatal smoking is not simply a marker of other maternal problem behaviors. The risk variant in the MAOA main effect and interaction analyses was the high activity MAOA genotype, which is discrepant from consensus findings in male samples. This result contributes to an emerging literature on sex-specific interaction effects for MAOA. PMID:22610759

  18. Cooperative simulation of lithography and topography for three-dimensional high-aspect-ratio etching

    NASA Astrophysics Data System (ADS)

    Ichikawa, Takashi; Yagisawa, Takashi; Furukawa, Shinichi; Taguchi, Takafumi; Nojima, Shigeki; Murakami, Sadatoshi; Tamaoki, Naoki

    2018-06-01

    A topography simulation of high-aspect-ratio etching considering transports of ions and neutrals is performed, and the mechanism of reactive ion etching (RIE) residues in three-dimensional corner patterns is revealed. Limited ion flux and CF2 diffusion from the wide space of the corner is found to have an effect on the RIE residues. Cooperative simulation of lithography and topography is used to solve the RIE residue problem.

  19. TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

    PubMed

    Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun

    2017-12-01

    Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at

  20. Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.; Henson, Robert

    2012-01-01

    A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…

  1. SU-G-TeP3-14: Three-Dimensional Cluster Model in Inhomogeneous Dose Distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, J; Penagaricano, J; Narayanasamy, G

    2016-06-15

    Purpose: We aim to investigate 3D cluster formation in inhomogeneous dose distribution to search for new models predicting radiation tissue damage and further leading to new optimization paradigm for radiotherapy planning. Methods: The aggregation of higher dose in the organ at risk (OAR) than a preset threshold was chosen as the cluster whose connectivity dictates the cluster structure. Upon the selection of the dose threshold, the fractional density defined as the fraction of voxels in the organ eligible to be part of the cluster was determined according to the dose volume histogram (DVH). A Monte Carlo method was implemented tomore » establish a case pertinent to the corresponding DVH. Ones and zeros were randomly assigned to each OAR voxel with the sampling probability equal to the fractional density. Ten thousand samples were randomly generated to ensure a sufficient number of cluster sets. A recursive cluster searching algorithm was developed to analyze the cluster with various connectivity choices like 1-, 2-, and 3-connectivity. The mean size of the largest cluster (MSLC) from the Monte Carlo samples was taken to be a function of the fractional density. Various OARs from clinical plans were included in the study. Results: Intensive Monte Carlo study demonstrates the inverse relationship between the MSLC and the cluster connectivity as anticipated and the cluster size does not change with fractional density linearly regardless of the connectivity types. An initially-slow-increase to exponential growth transition of the MSLC from low to high density was observed. The cluster sizes were found to vary within a large range and are relatively independent of the OARs. Conclusion: The Monte Carlo study revealed that the cluster size could serve as a suitable index of the tissue damage (percolation cluster) and the clinical outcome of the same DVH might be potentially different.« less

  2. Cluster management.

    PubMed

    Katz, R

    1992-11-01

    Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.

  3. Moving boundary problems for a rarefied gas: Spatially one-dimensional case

    NASA Astrophysics Data System (ADS)

    Tsuji, Tetsuro; Aoki, Kazuo

    2013-10-01

    Unsteady flows of a rarefied gas in a full space caused by an oscillation of an infinitely wide plate in its normal direction are investigated numerically on the basis of the Bhatnagar-Gross-Krook (BGK) model of the Boltzmann equation. The paper aims at showing properties and difficulties inherent to moving boundary problems in kinetic theory of gases using a simple one-dimensional setting. More specifically, the following two problems are considered: (Problem I) the plate starts a forced harmonic oscillation (forced motion); (Problem II) the plate, which is subject to an external restoring force obeying Hooke’s law, is displaced from its equilibrium position and released (free motion). The physical interest in Problem I lies in the propagation of nonlinear acoustic waves in a rarefied gas, whereas that in Problem II in the decay rate of the oscillation of the plate. An accurate numerical method, which is capable of describing singularities caused by the oscillating plate, is developed on the basis of the method of characteristics and is applied to the two problems mentioned above. As a result, the unsteady behavior of the solution, such as the propagation of discontinuities and some weaker singularities in the molecular velocity distribution function, are clarified. Some results are also compared with those based on the existing method.

  4. Necessary optimality conditions for infinite dimensional state constrained control problems

    NASA Astrophysics Data System (ADS)

    Frankowska, H.; Marchini, E. M.; Mazzola, M.

    2018-06-01

    This paper is concerned with first order necessary optimality conditions for state constrained control problems in separable Banach spaces. Assuming inward pointing conditions on the constraint, we give a simple proof of Pontryagin maximum principle, relying on infinite dimensional neighboring feasible trajectories theorems proved in [20]. Further, we provide sufficient conditions guaranteeing normality of the maximum principle. We work in the abstract semigroup setting, but nevertheless we apply our results to several concrete models involving controlled PDEs. Pointwise state constraints (as positivity of the solutions) are allowed.

  5. Some problems of the calculation of three-dimensional boundary layer flows on general configurations

    NASA Technical Reports Server (NTRS)

    Cebeci, T.; Kaups, K.; Mosinskis, G. J.; Rehn, J. A.

    1973-01-01

    An accurate solution of the three-dimensional boundary layer equations over general configurations such as those encountered in aircraft and space shuttle design requires a very efficient, fast, and accurate numerical method with suitable turbulence models for the Reynolds stresses. The efficiency, speed, and accuracy of a three-dimensional numerical method together with the turbulence models for the Reynolds stresses are examined. The numerical method is the implicit two-point finite difference approach (Box Method) developed by Keller and applied to the boundary layer equations by Keller and Cebeci. In addition, a study of some of the problems that may arise in the solution of these equations for three-dimensional boundary layer flows over general configurations.

  6. Free boundary problems in shock reflection/diffraction and related transonic flow problems

    PubMed Central

    Chen, Gui-Qiang; Feldman, Mikhail

    2015-01-01

    Shock waves are steep wavefronts that are fundamental in nature, especially in high-speed fluid flows. When a shock hits an obstacle, or a flying body meets a shock, shock reflection/diffraction phenomena occur. In this paper, we show how several long-standing shock reflection/diffraction problems can be formulated as free boundary problems, discuss some recent progress in developing mathematical ideas, approaches and techniques for solving these problems, and present some further open problems in this direction. In particular, these shock problems include von Neumann's problem for shock reflection–diffraction by two-dimensional wedges with concave corner, Lighthill's problem for shock diffraction by two-dimensional wedges with convex corner, and Prandtl-Meyer's problem for supersonic flow impinging onto solid wedges, which are also fundamental in the mathematical theory of multidimensional conservation laws. PMID:26261363

  7. Independence screening for high dimensional nonlinear additive ODE models with applications to dynamic gene regulatory networks.

    PubMed

    Xue, Hongqi; Wu, Shuang; Wu, Yichao; Ramirez Idarraga, Juan C; Wu, Hulin

    2018-05-02

    Mechanism-driven low-dimensional ordinary differential equation (ODE) models are often used to model viral dynamics at cellular levels and epidemics of infectious diseases. However, low-dimensional mechanism-based ODE models are limited for modeling infectious diseases at molecular levels such as transcriptomic or proteomic levels, which is critical to understand pathogenesis of diseases. Although linear ODE models have been proposed for gene regulatory networks (GRNs), nonlinear regulations are common in GRNs. The reconstruction of large-scale nonlinear networks from time-course gene expression data remains an unresolved issue. Here, we use high-dimensional nonlinear additive ODEs to model GRNs and propose a 4-step procedure to efficiently perform variable selection for nonlinear ODEs. To tackle the challenge of high dimensionality, we couple the 2-stage smoothing-based estimation method for ODEs and a nonlinear independence screening method to perform variable selection for the nonlinear ODE models. We have shown that our method possesses the sure screening property and it can handle problems with non-polynomial dimensionality. Numerical performance of the proposed method is illustrated with simulated data and a real data example for identifying the dynamic GRN of Saccharomyces cerevisiae. Copyright © 2018 John Wiley & Sons, Ltd.

  8. Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.

    PubMed

    Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver

    2018-02-15

    Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. The R

  9. Additivity Principle in High-Dimensional Deterministic Systems

    NASA Astrophysics Data System (ADS)

    Saito, Keiji; Dhar, Abhishek

    2011-12-01

    The additivity principle (AP), conjectured by Bodineau and Derrida [Phys. Rev. Lett. 92, 180601 (2004)PRLTAO0031-900710.1103/PhysRevLett.92.180601], is discussed for the case of heat conduction in three-dimensional disordered harmonic lattices to consider the effects of deterministic dynamics, higher dimensionality, and different transport regimes, i.e., ballistic, diffusive, and anomalous transport. The cumulant generating function (CGF) for heat transfer is accurately calculated and compared with the one given by the AP. In the diffusive regime, we find a clear agreement with the conjecture even if the system is high dimensional. Surprisingly, even in the anomalous regime the CGF is also well fitted by the AP. Lower-dimensional systems are also studied and the importance of three dimensionality for the validity is stressed.

  10. Uncertainty quantification for complex systems with very high dimensional response using Grassmann manifold variations

    NASA Astrophysics Data System (ADS)

    Giovanis, D. G.; Shields, M. D.

    2018-07-01

    This paper addresses uncertainty quantification (UQ) for problems where scalar (or low-dimensional vector) response quantities are insufficient and, instead, full-field (very high-dimensional) responses are of interest. To do so, an adaptive stochastic simulation-based methodology is introduced that refines the probability space based on Grassmann manifold variations. The proposed method has a multi-element character discretizing the probability space into simplex elements using a Delaunay triangulation. For every simplex, the high-dimensional solutions corresponding to its vertices (sample points) are projected onto the Grassmann manifold. The pairwise distances between these points are calculated using appropriately defined metrics and the elements with large total distance are sub-sampled and refined. As a result, regions of the probability space that produce significant changes in the full-field solution are accurately resolved. An added benefit is that an approximation of the solution within each element can be obtained by interpolation on the Grassmann manifold. The method is applied to study the probability of shear band formation in a bulk metallic glass using the shear transformation zone theory.

  11. Gas expulsion in highly substructured embedded star clusters

    NASA Astrophysics Data System (ADS)

    Farias, J. P.; Fellhauer, M.; Smith, R.; Domínguez, R.; Dabringhausen, J.

    2018-06-01

    We investigate the response of initially substructured, young, embedded star clusters to instantaneous gas expulsion of their natal gas. We introduce primordial substructure to the stars and the gas by simplistically modelling the star formation process so as to obtain a variety of substructure distributed within our modelled star-forming regions. We show that, by measuring the virial ratio of the stars alone (disregarding the gas completely), we can estimate how much mass a star cluster will retain after gas expulsion to within 10 per cent accuracy, no matter how complex the background structure of the gas is, and we present a simple analytical recipe describing this behaviour. We show that the evolution of the star cluster while still embedded in the natal gas, and the behaviour of the gas before being expelled, is crucial process that affect the time-scale on which the cluster can evolve into a virialized spherical system. Embedded star clusters that have high levels of substructure are subvirial for longer times, enabling them to survive gas expulsion better than a virialized and spherical system. By using a more realistic treatment for the background gas than our previous studies, we find it very difficult to destroy the young clusters with instantaneous gas expulsion. We conclude that gas removal may not be the main culprit for the dissolution of young star clusters.

  12. Spatial model of the gecko foot hair: functional significance of highly specialized non-uniform geometry.

    PubMed

    Filippov, Alexander E; Gorb, Stanislav N

    2015-02-06

    One of the important problems appearing in experimental realizations of artificial adhesives inspired by gecko foot hair is so-called clusterization. If an artificially produced structure is flexible enough to allow efficient contact with natural rough surfaces, after a few attachment-detachment cycles, the fibres of the structure tend to adhere one to another and form clusters. Normally, such clusters are much larger than original fibres and, because they are less flexible, form much worse adhesive contacts especially with the rough surfaces. Main problem here is that the forces responsible for the clusterization are the same intermolecular forces which attract fibres to fractal surface of the substrate. However, arrays of real gecko setae are much less susceptible to this problem. One of the possible reasons for this is that ends of the seta have more sophisticated non-uniformly distributed three-dimensional structure than that of existing artificial systems. In this paper, we simulated three-dimensional spatial geometry of non-uniformly distributed branches of nanofibres of the setal tip numerically, studied its attachment-detachment dynamics and discussed its advantages versus uniformly distributed geometry.

  13. Thematic clustering of text documents using an EM-based approach

    PubMed Central

    2012-01-01

    Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528

  14. Applications of FEM and BEM in two-dimensional fracture mechanics problems

    NASA Technical Reports Server (NTRS)

    Min, J. B.; Steeve, B. E.; Swanson, G. R.

    1992-01-01

    A comparison of the finite element method (FEM) and boundary element method (BEM) for the solution of two-dimensional plane strain problems in fracture mechanics is presented in this paper. Stress intensity factors (SIF's) were calculated using both methods for elastic plates with either a single-edge crack or an inclined-edge crack. In particular, two currently available programs, ANSYS for finite element analysis and BEASY for boundary element analysis, were used.

  15. Analysis of a Two-Dimensional Thermal Cloaking Problem on the Basis of Optimization

    NASA Astrophysics Data System (ADS)

    Alekseev, G. V.

    2018-04-01

    For a two-dimensional model of thermal scattering, inverse problems arising in the development of tools for cloaking material bodies on the basis of a mixed thermal cloaking strategy are considered. By applying the optimization approach, these problems are reduced to optimization ones in which the role of controls is played by variable parameters of the medium occupying the cloaking shell and by the heat flux through a boundary segment of the basic domain. The solvability of the direct and optimization problems is proved, and an optimality system is derived. Based on its analysis, sufficient conditions on the input data are established that ensure the uniqueness and stability of optimal solutions.

  16. A hyper-spherical adaptive sparse-grid method for high-dimensional discontinuity detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Guannan; Webster, Clayton G.; Gunzburger, Max D.

    This work proposes and analyzes a hyper-spherical adaptive hierarchical sparse-grid method for detecting jump discontinuities of functions in high-dimensional spaces is proposed. The method is motivated by the theoretical and computational inefficiencies of well-known adaptive sparse-grid methods for discontinuity detection. Our novel approach constructs a function representation of the discontinuity hyper-surface of an N-dimensional dis- continuous quantity of interest, by virtue of a hyper-spherical transformation. Then, a sparse-grid approximation of the transformed function is built in the hyper-spherical coordinate system, whose value at each point is estimated by solving a one-dimensional discontinuity detection problem. Due to the smoothness of themore » hyper-surface, the new technique can identify jump discontinuities with significantly reduced computational cost, compared to existing methods. Moreover, hierarchical acceleration techniques are also incorporated to further reduce the overall complexity. Rigorous error estimates and complexity analyses of the new method are provided as are several numerical examples that illustrate the effectiveness of the approach.« less

  17. Discriminative clustering on manifold for adaptive transductive classification.

    PubMed

    Zhang, Zhao; Jia, Lei; Zhang, Min; Li, Bing; Zhang, Li; Li, Fanzhang

    2017-10-01

    In this paper, we mainly propose a novel adaptive transductive label propagation approach by joint discriminative clustering on manifolds for representing and classifying high-dimensional data. Our framework seamlessly combines the unsupervised manifold learning, discriminative clustering and adaptive classification into a unified model. Also, our method incorporates the adaptive graph weight construction with label propagation. Specifically, our method is capable of propagating label information using adaptive weights over low-dimensional manifold features, which is different from most existing studies that usually predict the labels and construct the weights in the original Euclidean space. For transductive classification by our formulation, we first perform the joint discriminative K-means clustering and manifold learning to capture the low-dimensional nonlinear manifolds. Then, we construct the adaptive weights over the learnt manifold features, where the adaptive weights are calculated through performing the joint minimization of the reconstruction errors over features and soft labels so that the graph weights can be joint-optimal for data representation and classification. Using the adaptive weights, we can easily estimate the unknown labels of samples. After that, our method returns the updated weights for further updating the manifold features. Extensive simulations on image classification and segmentation show that our proposed algorithm can deliver the state-of-the-art performance on several public datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. The two-dimensional Stefan problem with slightly varying heat flux

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gammon, J.; Howarth, J.A.

    1995-09-01

    The authors solve the two-dimensional stefan problem of solidification in a half-space, where the heat flux at the wall is a slightly varying function of positioning along the wall, by means of a large Stefan number approximation (which turns out to be equivalent to a small time solution), and then by means of the Heat Balance Integral Method, which is valid for all time, and which agrees with the large Stefan number solution for small times. A representative solution is given for a particular form of the heat flux perturbation.

  19. Self-assembled three-dimensional chiral colloidal architecture

    NASA Astrophysics Data System (ADS)

    Ben Zion, Matan Yah; He, Xiaojin; Maass, Corinna C.; Sha, Ruojie; Seeman, Nadrian C.; Chaikin, Paul M.

    2017-11-01

    Although stereochemistry has been a central focus of the molecular sciences since Pasteur, its province has previously been restricted to the nanometric scale. We have programmed the self-assembly of micron-sized colloidal clusters with structural information stemming from a nanometric arrangement. This was done by combining DNA nanotechnology with colloidal science. Using the functional flexibility of DNA origami in conjunction with the structural rigidity of colloidal particles, we demonstrate the parallel self-assembly of three-dimensional microconstructs, evincing highly specific geometry that includes control over position, dihedral angles, and cluster chirality.

  20. Hierarchical trie packet classification algorithm based on expectation-maximization clustering

    PubMed Central

    Bi, Xia-an; Zhao, Junxia

    2017-01-01

    With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476

  1. Hierarchical trie packet classification algorithm based on expectation-maximization clustering.

    PubMed

    Bi, Xia-An; Zhao, Junxia

    2017-01-01

    With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.

  2. Re-estimating sample size in cluster randomised trials with active recruitment within clusters.

    PubMed

    van Schie, S; Moerbeek, M

    2014-08-30

    Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.

  3. Effect of Dimensional Salience and Salience of Variability on Problem Solving: A Developmental Study

    ERIC Educational Resources Information Center

    Zelniker, Tamar; And Others

    1975-01-01

    A matching task was presented to 120 subjects from 6 to 20 years of age to investigate the relative influence of dimensional salience and salience of variability on problem solving. The task included four dimensions: form, color, number, and position. (LLK)

  4. Bayesian propensity scores for high-dimensional causal inference: A comparison of drug-eluting to bare-metal coronary stents.

    PubMed

    Spertus, Jacob V; Normand, Sharon-Lise T

    2018-04-23

    High-dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high-dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high-dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Synthesis of borophenes: Anisotropic, two-dimensional boron polymorphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mannix, A. J.; Zhou, X. -F.; Kiraly, B.

    At the atomic-cluster scale, pure boron is markedly similar to carbon, forming simple planar molecules and cage-like fullerenes. Theoretical studies predict that two-dimensional (2D) boron sheets will adopt an atomic configuration similar to that of boron atomic clusters. We synthesized atomically thin, crystalline 2D boron sheets (i.e., borophene) on silver surfaces under ultrahigh-vacuum conditions. Atomic-scale characterization, supported by theoretical calculations, revealed structures reminiscent of fused boron clusters with multiple scales of anisotropic, out-of-plane buckling. Unlike bulk boron allotropes, borophene shows metallic characteristics that are consistent with predictions of a highly anisotropic, 2D metal.

  6. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal

  7. High-dimensional Controlled-phase Gate Between a 2 N -dimensional Photon and N Three-level Artificial Atoms

    NASA Astrophysics Data System (ADS)

    Ma, Yun-Ming; Wang, Tie-Jun

    2017-10-01

    Higher-dimensional quantum system is of great interest owing to the outstanding features exhibited in the implementation of novel fundamental tests of nature and application in various quantum information tasks. High-dimensional quantum logic gate is a key element in scalable quantum computation and quantum communication. In this paper, we propose a scheme to implement a controlled-phase gate between a 2 N -dimensional photon and N three-level artificial atoms. This high-dimensional controlled-phase gate can serve as crucial components of the high-capacity, long-distance quantum communication. We use the high-dimensional Bell state analysis as an example to show the application of this device. Estimates on the system requirements indicate that our protocol is realizable with existing or near-further technologies. This scheme is ideally suited to solid-state integrated optical approaches to quantum information processing, and it can be applied to various system, such as superconducting qubits coupled to a resonator or nitrogen-vacancy centers coupled to a photonic-band-gap structures.

  8. Detection of one-dimensional migration of single self-interstitial atoms in tungsten using high-voltage electron microscopy

    PubMed Central

    Amino, T.; Arakawa, K.; Mori, H.

    2016-01-01

    The dynamic behaviour of atomic-size disarrangements of atoms—point defects (self-interstitial atoms (SIAs) and vacancies)—often governs the macroscopic properties of crystalline materials. However, the dynamics of SIAs have not been fully uncovered because of their rapid migration. Using a combination of high-voltage transmission electron microscopy and exhaustive kinetic Monte Carlo simulations, we determine the dynamics of the rapidly migrating SIAs from the formation process of the nanoscale SIA clusters in tungsten as a typical body-centred cubic (BCC) structure metal under the constant-rate production of both types of point defects with high-energy electron irradiation, which must reflect the dynamics of individual SIAs. We reveal that the migration dimension of SIAs is not three-dimensional (3D) but one-dimensional (1D). This result overturns the long-standing and well-accepted view of SIAs in BCC metals and supports recent results obtained by ab-initio simulations. The SIA dynamics clarified here will be one of the key factors to accurately predict the lifetimes of nuclear fission and fusion materials. PMID:27185352

  9. An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data.

    PubMed

    Yu, Hualong; Ni, Jun

    2014-01-01

    Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.

  10. Properties of highly clustered networks

    NASA Astrophysics Data System (ADS)

    Newman, M. E.

    2003-08-01

    We propose and solve exactly a model of a network that has both a tunable degree distribution and a tunable clustering coefficient. Among other things, our results indicate that increased clustering leads to a decrease in the size of the giant component of the network. We also study susceptible/infective/recovered type epidemic processes within the model and find that clustering decreases the size of epidemics, but also decreases the epidemic threshold, making it easier for diseases to spread. In addition, clustering causes epidemics to saturate sooner, meaning that they infect a near-maximal fraction of the network for quite low transmission rates.

  11. Electrodeposited three-dimensional Ni-Si nanocable arrays as high performance anodes for lithium ion batteries.

    PubMed

    Liu, Hao; Hu, Liangbin; Meng, Ying Shirley; Li, Quan

    2013-11-07

    A configuration of three-dimensional Ni-Si nanocable array anodes is proposed to overcome the severe volume change problem of Si during the charging-discharging process. In the fabrication process, a simple and low cost electrodeposition is employed to deposit Si instead of the common expansive vapor phase deposition methods. The optimum composite nanocable array electrode achieves a high specific capacity ~1900 mA h g(-1) at 0.05 C. After 100 cycles at 0.5 C, 88% of the initial capacity (~1300 mA h g(-1)) remains, suggesting its good capacity retention ability. The high performance of the composite nanocable electrode is attributed to its excellent adhesion of the active material on the three-dimensional current collector and short ionic/electronic transport pathways during cycling.

  12. Effect of palladium doping on the stability and fragmentation patterns of cationic gold clusters

    NASA Astrophysics Data System (ADS)

    Ferrari, P.; Hussein, H. A.; Heard, C. J.; Vanbuel, J.; Johnston, R. L.; Lievens, P.; Janssens, E.

    2018-05-01

    We analyze in detail how the interplay between electronic structure and cluster geometry determines the stability and the fragmentation channels of single Pd-doped cationic Au clusters, PdA uN-1+ (N =2 -20 ). For this purpose, a combination of photofragmentation experiments and density functional theory calculations was employed. A remarkable agreement between the experiment and the calculations is obtained. Pd doping is found to modify the structure of the Au clusters, in particular altering the two-dimensional to three-dimensional transition size, with direct consequences on the stability of the clusters. Analysis of the electronic density of states of the clusters shows that depending on cluster size, Pd delocalizes one 4 d electron, giving an enhanced stability to PdA u6 + , or remains with all 4 d10 electrons localized, closing an electronic shell in PdA u9 + . Furthermore, it is observed that for most clusters, Au evaporation is the lowest-energy decay channel, although for some sizes Pd evaporation competes. In particular, PdA u7 + and PdA u9 + decay by Pd evaporation due to the high stability of the A u7 + and A u9 + fragmentation products.

  13. The formation of magnetic silicide Fe3Si clusters during ion implantation

    NASA Astrophysics Data System (ADS)

    Balakirev, N.; Zhikharev, V.; Gumarov, G.

    2014-05-01

    A simple two-dimensional model of the formation of magnetic silicide Fe3Si clusters during high-dose Fe ion implantation into silicon has been proposed and the cluster growth process has been computer simulated. The model takes into account the interaction between the cluster magnetization and magnetic moments of Fe atoms random walking in the implanted layer. If the clusters are formed in the presence of the external magnetic field parallel to the implanted layer, the model predicts the elongation of the growing cluster in the field direction. It has been proposed that the cluster elongation results in the uniaxial magnetic anisotropy in the plane of the implanted layer, which is observed in iron silicide films ion-beam synthesized in the external magnetic field.

  14. A hybrid intelligent method for three-dimensional short-term prediction of dissolved oxygen content in aquaculture.

    PubMed

    Chen, Yingyi; Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang

    2018-01-01

    A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies.

  15. Big Data Analytics for Demand Response: Clustering Over Space and Time

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chelmis, Charalampos; Kolte, Jahanvi; Prasanna, Viktor K.

    The pervasive deployment of advanced sensing infrastructure in Cyber-Physical systems, such as the Smart Grid, has resulted in an unprecedented data explosion. Such data exhibit both large volumes and high velocity characteristics, two of the three pillars of Big Data, and have a time-series notion as datasets in this context typically consist of successive measurements made over a time interval. Time-series data can be valuable for data mining and analytics tasks such as identifying the “right” customers among a diverse population, to target for Demand Response programs. However, time series are challenging to mine due to their high dimensionality. Inmore » this paper, we motivate this problem using a real application from the smart grid domain. We explore novel representations of time-series data for BigData analytics, and propose a clustering technique for determining natural segmentation of customers and identification of temporal consumption patterns. Our method is generizable to large-scale, real-world scenarios, without making any assumptions about the data. We evaluate our technique using real datasets from smart meters, totaling ~ 18,200,000 data points, and show the efficacy of our technique in efficiency detecting the number of optimal number of clusters.« less

  16. Sequential updating of multimodal hydrogeologic parameter fields using localization and clustering techniques

    NASA Astrophysics Data System (ADS)

    Sun, Alexander Y.; Morris, Alan P.; Mohanty, Sitakanta

    2009-07-01

    Estimated parameter distributions in groundwater models may contain significant uncertainties because of data insufficiency. Therefore, adaptive uncertainty reduction strategies are needed to continuously improve model accuracy by fusing new observations. In recent years, various ensemble Kalman filters have been introduced as viable tools for updating high-dimensional model parameters. However, their usefulness is largely limited by the inherent assumption of Gaussian error statistics. Hydraulic conductivity distributions in alluvial aquifers, for example, are usually non-Gaussian as a result of complex depositional and diagenetic processes. In this study, we combine an ensemble Kalman filter with grid-based localization and a Gaussian mixture model (GMM) clustering techniques for updating high-dimensional, multimodal parameter distributions via dynamic data assimilation. We introduce innovative strategies (e.g., block updating and dimension reduction) to effectively reduce the computational costs associated with these modified ensemble Kalman filter schemes. The developed data assimilation schemes are demonstrated numerically for identifying the multimodal heterogeneous hydraulic conductivity distributions in a binary facies alluvial aquifer. Our results show that localization and GMM clustering are very promising techniques for assimilating high-dimensional, multimodal parameter distributions, and they outperform the corresponding global ensemble Kalman filter analysis scheme in all scenarios considered.

  17. Application of 2D and 3D image technologies to characterise morphological attributes of grapevine clusters.

    PubMed

    Tello, Javier; Cubero, Sergio; Blasco, José; Tardaguila, Javier; Aleixos, Nuria; Ibáñez, Javier

    2016-10-01

    Grapevine cluster morphology influences the quality and commercial value of wine and table grapes. It is routinely evaluated by subjective and inaccurate methods that do not meet the requirements set by the food industry. Novel two-dimensional (2D) and three-dimensional (3D) machine vision technologies emerge as promising tools for its automatic and fast evaluation. The automatic evaluation of cluster length, width and elongation was successfully achieved by the analysis of 2D images, significant and strong correlations with the manual methods being found (r = 0.959, 0.861 and 0.852, respectively). The classification of clusters according to their shape can be achieved by evaluating their conicity in different sections of the cluster. The geometric reconstruction of the morphological volume of the cluster from 2D features worked better than the direct 3D laser scanning system, showing a high correlation (r = 0.956) with the manual approach (water displacement method). In addition, we constructed and validated a simple linear regression model for cluster compactness estimation. It showed a high predictive capacity for both the training and validation subsets of clusters (R(2)  = 84.5 and 71.1%, respectively). The methodologies proposed in this work provide continuous and accurate data for the fast and objective characterisation of cluster morphology. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  18. Symptom clusters in patients with high-grade glioma.

    PubMed

    Fox, Sherry W; Lyon, Debra; Farace, Elana

    2007-01-01

    To describe the co-occurring symptoms (depression, fatigue, pain, sleep disturbance, and cognitive impairment), quality of life (QoL), and functional status in patients with high-grade glioma. Correlational, descriptive study of 73 participants with high-grade glioma in the U.S. Nine brief measures were obtained with a mailed survey. Participants were recruited from the online message board of The Healing Exchange BRAIN TRUST, a nonprofit organization dedicated to improving quality of life for people with brain tumors. Two symptom cluster models were examined. Four co-occurring symptoms were significantly correlated with each other and explained 29% of the variance in QoL: depression, fatigue, sleep disturbance, and cognitive impairment. Depression, fatigue, sleep disturbance, cognitive impairment, and pain were significantly correlated with each other and explained 62% of the variance in functional status. The interrelationships of the symptoms examined in this study and their relationships with QoL and functional status meet the criteria for defining a symptom cluster. The differences in the models of QoL and functional status indicates that symptom clusters may have unique characteristics in patients with gliomas.

  19. Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.

    PubMed

    Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin

    2018-05-03

    Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.

  20. Two-dimensional problem of two Coulomb centers at small intercenter distances

    NASA Astrophysics Data System (ADS)

    Bondar, D. I.; Hnatich, M.; Lazur, V. Yu.

    2006-08-01

    We use analytic methods to analyze the discrete spectrum for the problem (Z1eZ2)2 in the united-atom limit ( R ≪ 1) and obtain asymptotic expansions for the quantum defect and energy terms of the system (Z1eZ2)2 at small intercenter distances R up to terms of the order O(R6). We investigate the effect of the dimensionality factor on the energy spectrum of the hydrogen molecular ion H{2/+}.

  1. Spectral-clustering approach to Lagrangian vortex detection.

    PubMed

    Hadjighasem, Alireza; Karrasch, Daniel; Teramoto, Hiroshi; Haller, George

    2016-06-01

    One of the ubiquitous features of real-life turbulent flows is the existence and persistence of coherent vortices. Here we show that such coherent vortices can be extracted as clusters of Lagrangian trajectories. We carry out the clustering on a weighted graph, with the weights measuring pairwise distances of fluid trajectories in the extended phase space of positions and time. We then extract coherent vortices from the graph using tools from spectral graph theory. Our method locates all coherent vortices in the flow simultaneously, thereby showing high potential for automated vortex tracking. We illustrate the performance of this technique by identifying coherent Lagrangian vortices in several two- and three-dimensional flows.

  2. A simple new filter for nonlinear high-dimensional data assimilation

    NASA Astrophysics Data System (ADS)

    Tödter, Julian; Kirchgessner, Paul; Ahrens, Bodo

    2015-04-01

    performance with a realistic ensemble size. The results confirm that, in principle, it can be applied successfully and as simple as the ETKF in high-dimensional problems without further modifications of the algorithm, even though it is only based on the particle weights. This proves that the suggested method constitutes a useful filter for nonlinear, high-dimensional data assimilation, and is able to overcome the curse of dimensionality even in deterministic systems.

  3. Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

    PubMed Central

    Liu, Wenfen

    2017-01-01

    Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

  4. Cluster properties of the one-dimensional lattice gas: the microscopic meaning of grand potential.

    PubMed

    Fronczak, Agata

    2013-02-01

    Using a concrete example, we demonstrate how the combinatorial approach to a general system of particles, which was introduced in detail in an earlier paper [Fronczak, Phys. Rev. E 86, 041139 (2012)], works and where this approach provides a genuine extension of results obtained through more traditional methods of statistical mechanics. We study the cluster properties of a one-dimensional lattice gas with nearest-neighbor interactions. Three cases (the infinite temperature limit, the range of finite temperatures, and the zero temperature limit) are discussed separately, yielding interesting results and providing alternative proof of known results. In particular, the closed-form expression for the grand partition function in the zero temperature limit is obtained, which results in the nonanalytic behavior of the grand potential, in accordance with the Yang-Lee theory.

  5. Optimization and uncertainty assessment of strongly nonlinear groundwater models with high parameter dimensionality

    NASA Astrophysics Data System (ADS)

    Keating, Elizabeth H.; Doherty, John; Vrugt, Jasper A.; Kang, Qinjun

    2010-10-01

    Highly parameterized and CPU-intensive groundwater models are increasingly being used to understand and predict flow and transport through aquifers. Despite their frequent use, these models pose significant challenges for parameter estimation and predictive uncertainty analysis algorithms, particularly global methods which usually require very large numbers of forward runs. Here we present a general methodology for parameter estimation and uncertainty analysis that can be utilized in these situations. Our proposed method includes extraction of a surrogate model that mimics key characteristics of a full process model, followed by testing and implementation of a pragmatic uncertainty analysis technique, called null-space Monte Carlo (NSMC), that merges the strengths of gradient-based search and parameter dimensionality reduction. As part of the surrogate model analysis, the results of NSMC are compared with a formal Bayesian approach using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm. Such a comparison has never been accomplished before, especially in the context of high parameter dimensionality. Despite the highly nonlinear nature of the inverse problem, the existence of multiple local minima, and the relatively large parameter dimensionality, both methods performed well and results compare favorably with each other. Experiences gained from the surrogate model analysis are then transferred to calibrate the full highly parameterized and CPU intensive groundwater model and to explore predictive uncertainty of predictions made by that model. The methodology presented here is generally applicable to any highly parameterized and CPU-intensive environmental model, where efficient methods such as NSMC provide the only practical means for conducting predictive uncertainty analysis.

  6. Patterns of victimization between and within peer clusters in a high school social network.

    PubMed

    Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R

    2012-01-01

    This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.

  7. Chiral Silver-Lanthanide Metal-Organic Frameworks Comprised of One-Dimensional Triple Right-Handed Helical Chains Based on [Ln7(μ3-OH)8]13+ Clusters.

    PubMed

    Guo, Yan; Zhang, Lijuan; Muhammad, Nadeem; Xu, Yan; Zhou, Yunshan; Tang, Fang; Yang, Shaowei

    2018-02-05

    Three new isostructural chiral silver-lanthanide heterometal-organic frameworks [Ag 3 Ln 7 (μ 3 -OH) 8 (bpdc) 6 (NO 3 ) 3 (H 2 O) 6 ](NO 3 )·2H 2 O [Ln = Eu (1), Tb (2, Sm (3); H 2 bpdc = 2,2'-bipyridine-3,3'-dicarboxylic acid] based on heptanuclear lanthanide clusters [Ln 7 (μ 3 -OH) 8 ] 13+ comprised of one-dimensional triple right-handed helical chains were hydrothermally synthesized. Various means such as UV-vis spectroscopy, IR spectroscopy, elemental analysis, powder X-ray diffraction, and thermogravimetric/differential thermal analysis were used to characterize the compounds, wherein compound 3 was crystallographically characterized. In the structure of compound 3, eight μ 3 -OH - groups link seven Sm 3+ ions, forming a heptanuclear cluster, [Sm 7 (μ 3 -OH) 8 ] 13+ , and the adjacent [Sm 7 (μ 3 -OH) 8 ] 13+ clusters are linked by the carboxylic groups of bpdc 2- ligands, leading to the formation of a one-dimensional triple right-handed helical chain. The adjacent triple right-handed helical chains are further joined together by coordinating the pyridyl N atoms of the bpdc 2- ligands with Ag + , resulting in a chiral three-dimensional silver(I)-lanthanide(III) heterometal-organic framework with one-dimensional channels wherein NO 3 - anions and crystal lattice H 2 O molecules are trapped. The compounds were studied systematically with respect to their photoluminescence properties and energy-transfer mechanism, and it was found that H 2 bpdc (the energy level for the triplet states of the ligand H 2 bpdc is 21505 cm -1 ) can sensitize Eu 3+ luminescence more effectively than Tb 3+ and Sm 3+ luminescence because of effective energy transfer from bpdc 2- to Eu 3+ under excitation in compound 1.

  8. High-resolution three-dimensional imaging with compress sensing

    NASA Astrophysics Data System (ADS)

    Wang, Jingyi; Ke, Jun

    2016-10-01

    LIDAR three-dimensional imaging technology have been used in many fields, such as military detection. However, LIDAR require extremely fast data acquisition speed. This makes the manufacture of detector array for LIDAR system is very difficult. To solve this problem, we consider using compress sensing which can greatly decrease the data acquisition and relax the requirement of a detection device. To use the compressive sensing idea, a spatial light modulator will be used to modulate the pulsed light source. Then a photodetector is used to receive the reflected light. A convex optimization problem is solved to reconstruct the 2D depth map of the object. To improve the resolution in transversal direction, we use multiframe image restoration technology. For each 2D piecewise-planar scene, we move the SLM half-pixel each time. Then the position where the modulated light illuminates will changed accordingly. We repeat moving the SLM to four different directions. Then we can get four low-resolution depth maps with different details of the same plane scene. If we use all of the measurements obtained by the subpixel movements, we can reconstruct a high-resolution depth map of the sense. A linear minimum-mean-square error algorithm is used for the reconstruction. By combining compress sensing and multiframe image restoration technology, we reduce the burden on data analyze and improve the efficiency of detection. More importantly, we obtain high-resolution depth maps of a 3D scene.

  9. The Use of Signal Dimensionality for Automatic QC of Seismic Array Data

    NASA Astrophysics Data System (ADS)

    Rowe, C. A.; Stead, R. J.; Begnaud, M. L.; Draganov, D.; Maceira, M.; Gomez, M.

    2014-12-01

    A significant problem in seismic array analysis is the inclusion of bad sensor channels in the beam-forming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by-node basis, so the dimensionality of the node traffic is instead monitored for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. We examine the signal dimension in similar way to the method addressing node traffic anomalies in large computer systems. We explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements. We show preliminary results applied to arrays in Kazakhstan (Makanchi) and Argentina (Malargue).

  10. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  11. Synthesis of borophenes: Anisotropic, two-dimensional boron polymorphs.

    PubMed

    Mannix, Andrew J; Zhou, Xiang-Feng; Kiraly, Brian; Wood, Joshua D; Alducin, Diego; Myers, Benjamin D; Liu, Xiaolong; Fisher, Brandon L; Santiago, Ulises; Guest, Jeffrey R; Yacaman, Miguel Jose; Ponce, Arturo; Oganov, Artem R; Hersam, Mark C; Guisinger, Nathan P

    2015-12-18

    At the atomic-cluster scale, pure boron is markedly similar to carbon, forming simple planar molecules and cage-like fullerenes. Theoretical studies predict that two-dimensional (2D) boron sheets will adopt an atomic configuration similar to that of boron atomic clusters. We synthesized atomically thin, crystalline 2D boron sheets (i.e., borophene) on silver surfaces under ultrahigh-vacuum conditions. Atomic-scale characterization, supported by theoretical calculations, revealed structures reminiscent of fused boron clusters with multiple scales of anisotropic, out-of-plane buckling. Unlike bulk boron allotropes, borophene shows metallic characteristics that are consistent with predictions of a highly anisotropic, 2D metal. Copyright © 2015, American Association for the Advancement of Science.

  12. Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

    NASA Astrophysics Data System (ADS)

    Yan, Hui; Wang, K. G.; Jones, Jim E.

    2016-06-01

    A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.

  13. Generalizing MOND to explain the missing mass in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Hodson, Alistair O.; Zhao, Hongsheng

    2017-02-01

    Context. MOdified Newtonian Dynamics (MOND) is a gravitational framework designed to explain the astronomical observations in the Universe without the inclusion of particle dark matter. MOND, in its current form, cannot explain the missing mass in galaxy clusters without the inclusion of some extra mass, be it in the form of neutrinos or non-luminous baryonic matter. We investigate whether the MOND framework can be generalized to account for the missing mass in galaxy clusters by boosting gravity in high gravitational potential regions. We examine and review Extended MOND (EMOND), which was designed to increase the MOND scale acceleration in high potential regions, thereby boosting the gravity in clusters. Aims: We seek to investigate galaxy cluster mass profiles in the context of MOND with the primary aim at explaining the missing mass problem fully without the need for dark matter. Methods: Using the assumption that the clusters are in hydrostatic equilibrium, we can compute the dynamical mass of each cluster and compare the result to the predicted mass of the EMOND formalism. Results: We find that EMOND has some success in fitting some clusters, but overall has issues when trying to explain the mass deficit fully. We also investigate an empirical relation to solve the cluster problem, which is found by analysing the cluster data and is based on the MOND paradigm. We discuss the limitations in the text.

  14. Rigidity of transmembrane proteins determines their cluster shape

    NASA Astrophysics Data System (ADS)

    Jafarinia, Hamidreza; Khoshnood, Atefeh; Jalali, Mir Abbas

    2016-01-01

    Protein aggregation in cell membrane is vital for the majority of biological functions. Recent experimental results suggest that transmembrane domains of proteins such as α -helices and β -sheets have different structural rigidities. We use molecular dynamics simulation of a coarse-grained model of protein-embedded lipid membranes to investigate the mechanisms of protein clustering. For a variety of protein concentrations, our simulations under thermal equilibrium conditions reveal that the structural rigidity of transmembrane domains dramatically affects interactions and changes the shape of the cluster. We have observed stable large aggregates even in the absence of hydrophobic mismatch, which has been previously proposed as the mechanism of protein aggregation. According to our results, semiflexible proteins aggregate to form two-dimensional clusters, while rigid proteins, by contrast, form one-dimensional string-like structures. By assuming two probable scenarios for the formation of a two-dimensional triangular structure, we calculate the lipid density around protein clusters and find that the difference in lipid distribution around rigid and semiflexible proteins determines the one- or two-dimensional nature of aggregates. It is found that lipids move faster around semiflexible proteins than rigid ones. The aggregation mechanism suggested in this paper can be tested by current state-of-the-art experimental facilities.

  15. Effective dimensional reduction algorithm for eigenvalue problems for thin elastic structures: A paradigm in three dimensions

    PubMed Central

    Ovtchinnikov, Evgueni E.; Xanthis, Leonidas S.

    2000-01-01

    We present a methodology for the efficient numerical solution of eigenvalue problems of full three-dimensional elasticity for thin elastic structures, such as shells, plates and rods of arbitrary geometry, discretized by the finite element method. Such problems are solved by iterative methods, which, however, are known to suffer from slow convergence or even convergence failure, when the thickness is small. In this paper we show an effective way of resolving this difficulty by invoking a special preconditioning technique associated with the effective dimensional reduction algorithm (EDRA). As an example, we present an algorithm for computing the minimal eigenvalue of a thin elastic plate and we show both theoretically and numerically that it is robust with respect to both the thickness and discretization parameters, i.e. the convergence does not deteriorate with diminishing thickness or mesh refinement. This robustness is sine qua non for the efficient computation of large-scale eigenvalue problems for thin elastic structures. PMID:10655469

  16. A sparse grid based method for generative dimensionality reduction of high-dimensional data

    NASA Astrophysics Data System (ADS)

    Bohn, Bastian; Garcke, Jochen; Griebel, Michael

    2016-03-01

    Generative dimensionality reduction methods play an important role in machine learning applications because they construct an explicit mapping from a low-dimensional space to the high-dimensional data space. We discuss a general framework to describe generative dimensionality reduction methods, where the main focus lies on a regularized principal manifold learning variant. Since most generative dimensionality reduction algorithms exploit the representer theorem for reproducing kernel Hilbert spaces, their computational costs grow at least quadratically in the number n of data. Instead, we introduce a grid-based discretization approach which automatically scales just linearly in n. To circumvent the curse of dimensionality of full tensor product grids, we use the concept of sparse grids. Furthermore, in real-world applications, some embedding directions are usually more important than others and it is reasonable to refine the underlying discretization space only in these directions. To this end, we employ a dimension-adaptive algorithm which is based on the ANOVA (analysis of variance) decomposition of a function. In particular, the reconstruction error is used to measure the quality of an embedding. As an application, the study of large simulation data from an engineering application in the automotive industry (car crash simulation) is performed.

  17. Reconstruction of the two-dimensional gravitational potential of galaxy clusters from X-ray and Sunyaev-Zel'dovich measurements

    NASA Astrophysics Data System (ADS)

    Tchernin, C.; Bartelmann, M.; Huber, K.; Dekel, A.; Hurier, G.; Majer, C. L.; Meyer, S.; Zinger, E.; Eckert, D.; Meneghetti, M.; Merten, J.

    2018-06-01

    Context. The mass of galaxy clusters is not a direct observable, nonetheless it is commonly used to probe cosmological models. Based on the combination of all main cluster observables, that is, the X-ray emission, the thermal Sunyaev-Zel'dovich (SZ) signal, the velocity dispersion of the cluster galaxies, and gravitational lensing, the gravitational potential of galaxy clusters can be jointly reconstructed. Aims: We derive the two main ingredients required for this joint reconstruction: the potentials individually reconstructed from the observables and their covariance matrices, which act as a weight in the joint reconstruction. We show here the method to derive these quantities. The result of the joint reconstruction applied to a real cluster will be discussed in a forthcoming paper. Methods: We apply the Richardson-Lucy deprojection algorithm to data on a two-dimensional (2D) grid. We first test the 2D deprojection algorithm on a β-profile. Assuming hydrostatic equilibrium, we further reconstruct the gravitational potential of a simulated galaxy cluster based on synthetic SZ and X-ray data. We then reconstruct the projected gravitational potential of the massive and dynamically active cluster Abell 2142, based on the X-ray observations collected with XMM-Newton and the SZ observations from the Planck satellite. Finally, we compute the covariance matrix of the projected reconstructed potential of the cluster Abell 2142 based on the X-ray measurements collected with XMM-Newton. Results: The gravitational potentials of the simulated cluster recovered from synthetic X-ray and SZ data are consistent, even though the potential reconstructed from X-rays shows larger deviations from the true potential. Regarding Abell 2142, the projected gravitational cluster potentials recovered from SZ and X-ray data reproduce well the projected potential inferred from gravitational-lensing observations. We also observe that the covariance matrix of the potential for Abell 2142

  18. Pattern-set generation algorithm for the one-dimensional multiple stock sizes cutting stock problem

    NASA Astrophysics Data System (ADS)

    Cui, Yaodong; Cui, Yi-Ping; Zhao, Zhigang

    2015-09-01

    A pattern-set generation algorithm (PSG) for the one-dimensional multiple stock sizes cutting stock problem (1DMSSCSP) is presented. The solution process contains two stages. In the first stage, the PSG solves the residual problems repeatedly to generate the patterns in the pattern set, where each residual problem is solved by the column-generation approach, and each pattern is generated by solving a single large object placement problem. In the second stage, the integer linear programming model of the 1DMSSCSP is solved using a commercial solver, where only the patterns in the pattern set are considered. The computational results of benchmark instances indicate that the PSG outperforms existing heuristic algorithms and rivals the exact algorithm in solution quality.

  19. Cluster-cluster clustering

    NASA Technical Reports Server (NTRS)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.

  20. On mixed derivatives type high dimensional multi-term fractional partial differential equations approximate solutions

    NASA Astrophysics Data System (ADS)

    Talib, Imran; Belgacem, Fethi Bin Muhammad; Asif, Naseer Ahmad; Khalil, Hammad

    2017-01-01

    In this research article, we derive and analyze an efficient spectral method based on the operational matrices of three dimensional orthogonal Jacobi polynomials to solve numerically the mixed partial derivatives type multi-terms high dimensions generalized class of fractional order partial differential equations. We transform the considered fractional order problem to an easily solvable algebraic equations with the aid of the operational matrices. Being easily solvable, the associated algebraic system leads to finding the solution of the problem. Some test problems are considered to confirm the accuracy and validity of the proposed numerical method. The convergence of the method is ensured by comparing our Matlab software simulations based obtained results with the exact solutions in the literature, yielding negligible errors. Moreover, comparative results discussed in the literature are extended and improved in this study.

  1. Highly Parallel Alternating Directions Algorithm for Time Dependent Problems

    NASA Astrophysics Data System (ADS)

    Ganzha, M.; Georgiev, K.; Lirkov, I.; Margenov, S.; Paprzycki, M.

    2011-11-01

    In our work, we consider the time dependent Stokes equation on a finite time interval and on a uniform rectangular mesh, written in terms of velocity and pressure. For this problem, a parallel algorithm based on a novel direction splitting approach is developed. Here, the pressure equation is derived from a perturbed form of the continuity equation, in which the incompressibility constraint is penalized in a negative norm induced by the direction splitting. The scheme used in the algorithm is composed of two parts: (i) velocity prediction, and (ii) pressure correction. This is a Crank-Nicolson-type two-stage time integration scheme for two and three dimensional parabolic problems in which the second-order derivative, with respect to each space variable, is treated implicitly while the other variable is made explicit at each time sub-step. In order to achieve a good parallel performance the solution of the Poison problem for the pressure correction is replaced by solving a sequence of one-dimensional second order elliptic boundary value problems in each spatial direction. The parallel code is implemented using the standard MPI functions and tested on two modern parallel computer systems. The performed numerical tests demonstrate good level of parallel efficiency and scalability of the studied direction-splitting-based algorithm.

  2. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than

  3. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  4. Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds (PREPRINT)

    DTIC Science & Technology

    2006-09-01

    Medioni, [11], estimates the local dimension using tensor voting . These recent works have clearly shown the necessity to go beyond manifold learning, into...2005. [11] P. Mordohai and G. Medioni. Unsupervised dimensionality estimation and manifold learning in high-dimensional spaces by tensor voting . In...walking, jumping, and arms waving. The whole run took 361 seconds in Matlab , while the classification time (PMM) can be neglected compared to the kNN

  5. Major cluster mergers and the location of the brightest cluster galaxy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martel, Hugo; Robichaud, Fidèle; Barai, Paramita, E-mail: Hugo.Martel@phy.ulaval.ca

    Using a large N-body cosmological simulation combined with a subgrid treatment of galaxy formation, merging, and tidal destruction, we study the formation and evolution of the galaxy and cluster population in a comoving volume (100 Mpc){sup 3} in a ΛCDM universe. At z = 0, our computational volume contains 1788 clusters with mass M {sub cl} > 1.1 × 10{sup 12} M {sub ☉}, including 18 massive clusters with M {sub cl} > 10{sup 14} M {sub ☉}. It also contains 1, 088, 797 galaxies with mass M {sub gal} ≥ 2 × 10{sup 9} M {sub ☉} and luminositymore » L > 9.5 × 10{sup 5} L {sub ☉}. For each cluster, we identified the brightest cluster galaxy (BCG). We then computed two separate statistics: the fraction f {sub BNC} of clusters in which the BCG is not the closest galaxy to the center of the cluster in projection, and the ratio Δv/σ, where Δv is the difference in radial velocity between the BCG and the whole cluster and σ is the radial velocity dispersion of the cluster. We found that f {sub BNC} increases from 0.05 for low-mass clusters (M {sub cl} ∼ 10{sup 12} M {sub ☉}) to 0.5 for high-mass clusters (M {sub cl} > 10{sup 14} M {sub ☉}) with very little dependence on cluster redshift. Most of this result turns out to be a projection effect and when we consider three-dimensional distances instead of projected distances, f {sub BNC} increases only to 0.2 at high-cluster mass. The values of Δv/σ vary from 0 to 1.8, with median values in the range 0.03-0.15 when considering all clusters, and 0.12-0.31 when considering only massive clusters. These results are consistent with previous observational studies and indicate that the central galaxy paradigm, which states that the BCG should be at rest at the center of the cluster, is usually valid, but exceptions are too common to be ignored. We built merger trees for the 18 most massive clusters in the simulation. Analysis of these trees reveal that 16 of these clusters have experienced 1 or several major or

  6. Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications.

    PubMed

    Karyotis, Vasileios; Tsitseklis, Konstantinos; Sotiropoulos, Konstantinos; Papavassiliou, Symeon

    2018-04-15

    In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan-Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.

  7. Global solvability and asymptotic behavior of a free boundary problem for the one-dimensional viscous radiative and reactive gas

    NASA Astrophysics Data System (ADS)

    Jiang, Jie; Zheng, Songmu

    2012-12-01

    In this paper, we study a Neumann and free boundary problem for the one-dimensional viscous radiative and reactive gas. We prove that under rather general assumptions on the heat conductivity κ, for any arbitrary large smooth initial data, the problem admits a unique global classical solution. Our global existence results improve those results by Umehara and Tani ["Global solution to the one-dimensional equations for a self-gravitating viscous radiative and reactive gas," J. Differ. Equations 234(2), 439-463 (2007), 10.1016/j.jde.2006.09.023; Umehara and Tani "Global solvability of the free-boundary problem for one-dimensional motion of a self-gravitating viscous radiative and reactive gas," Proc. Jpn. Acad., Ser. A: Math. Sci. 84(7), 123-128 (2008)], 10.3792/pjaa.84.123 and by Qin, Hu, and Wang ["Global smooth solutions for the compressible viscous and heat-conductive gas," Q. Appl. Math. 69(3), 509-528 (2011)]., 10.1090/S0033-569X-2011-01218-0 Moreover, we analyze the asymptotic behavior of the global solutions to our problem, and we prove that the global solution will converge to an equilibrium as time goes to infinity. This is the result obtained for this problem in the literature for the first time.

  8. Minimizers with Bounded Action for the High-Dimensional Frenkel-Kontorova Model

    NASA Astrophysics Data System (ADS)

    Miao, Xue-Qing; Wang, Ya-Nan; Qin, Wen-Xin

    In Aubry-Mather theory for monotone twist maps or for one-dimensional Frenkel-Kontorova (FK) model with nearest neighbor interactions, each global minimizer (minimal energy configuration) is naturally Birkhoff. However, this is not true for the one-dimensional FK model with non-nearest neighbor interactions or for the high-dimensional FK model. In this paper, we study the Birkhoff property of minimizers with bounded action for the high-dimensional FK model.

  9. Decomposition method for zonal resource allocation problems in telecommunication networks

    NASA Astrophysics Data System (ADS)

    Konnov, I. V.; Kashuba, A. Yu

    2016-11-01

    We consider problems of optimal resource allocation in telecommunication networks. We first give an optimization formulation for the case where the network manager aims to distribute some homogeneous resource (bandwidth) among users of one region with quadratic charge and fee functions and present simple and efficient solution methods. Next, we consider a more general problem for a provider of a wireless communication network divided into zones (clusters) with common capacity constraints. We obtain a convex quadratic optimization problem involving capacity and balance constraints. By using the dual Lagrangian method with respect to the capacity constraint, we suggest to reduce the initial problem to a single-dimensional optimization problem, but calculation of the cost function value leads to independent solution of zonal problems, which coincide with the above single region problem. Some results of computational experiments confirm the applicability of the new methods.

  10. Joint spatial-spectral hyperspectral image clustering using block-diagonal amplified affinity matrix

    NASA Astrophysics Data System (ADS)

    Fan, Lei; Messinger, David W.

    2018-03-01

    The large number of spectral channels in a hyperspectral image (HSI) produces a fine spectral resolution to differentiate between materials in a scene. However, difficult classes that have similar spectral signatures are often confused while merely exploiting information in the spectral domain. Therefore, in addition to spectral characteristics, the spatial relationships inherent in HSIs should also be considered for incorporation into classifiers. The growing availability of high spectral and spatial resolution of remote sensors provides rich information for image clustering. Besides the discriminating power in the rich spectrum, contextual information can be extracted from the spatial domain, such as the size and the shape of the structure to which one pixel belongs. In recent years, spectral clustering has gained popularity compared to other clustering methods due to the difficulty of accurate statistical modeling of data in high dimensional space. The joint spatial-spectral information could be effectively incorporated into the proximity graph for spectral clustering approach, which provides a better data representation by discovering the inherent lower dimensionality from the input space. We embedded both spectral and spatial information into our proposed local density adaptive affinity matrix, which is able to handle multiscale data by automatically selecting the scale of analysis for every pixel according to its neighborhood of the correlated pixels. Furthermore, we explored the "conductivity method," which aims at amplifying the block diagonal structure of the affinity matrix to further improve the performance of spectral clustering on HSI datasets.

  11. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  12. Atlas-guided cluster analysis of large tractography datasets.

    PubMed

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.

  13. High-performance parallel analysis of coupled problems for aircraft propulsion

    NASA Technical Reports Server (NTRS)

    Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

    1994-01-01

    This research program deals with the application of high-performance computing methods for the analysis of complete jet engines. We have entitled this program by applying the two dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition, and solution capabilities were successfully tested. We then focused attention on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion that results from these structural displacements. This is treated by a new arbitrary Lagrangian-Eulerian (ALE) technique that models the fluid mesh motion as that of a fictitious mass-spring network. New partitioned analysis procedures to treat this coupled three-component problem are developed. These procedures involved delayed corrections and subcycling. Preliminary results on the stability, accuracy, and MPP computational efficiency are reported.

  14. A critical analysis of high-redshift, massive, galaxy clusters. Part I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoyle, Ben; Jimenez, Raul; Verde, Licia

    2012-02-01

    We critically investigate current statistical tests applied to high redshift clusters of galaxies in order to test the standard cosmological model and describe their range of validity. We carefully compare a sample of high-redshift, massive, galaxy clusters with realistic Poisson sample simulations of the theoretical mass function, which include the effect of Eddington bias. We compare the observations and simulations using the following statistical tests: the distributions of ensemble and individual existence probabilities (in the > M, > z sense), the redshift distributions, and the 2d Kolmogorov-Smirnov test. Using seemingly rare clusters from Hoyle et al. (2011), and Jee etmore » al. (2011) and assuming the same survey geometry as in Jee et al. (2011, which is less conservative than Hoyle et al. 2011), we find that the ( > M, > z) existence probabilities of all clusters are fully consistent with ΛCDM. However assuming the same survey geometry, we use the 2d K-S test probability to show that the observed clusters are not consistent with being the least probable clusters from simulations at > 95% confidence, and are also not consistent with being a random selection of clusters, which may be caused by the non-trivial selection function and survey geometry. Tension can be removed if we examine only a X-ray selected sub sample, with simulations performed assuming a modified survey geometry.« less

  15. Inference for High-dimensional Differential Correlation Matrices *

    PubMed Central

    Cai, T. Tony; Zhang, Anru

    2015-01-01

    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380

  16. Inference for High-dimensional Differential Correlation Matrices.

    PubMed

    Cai, T Tony; Zhang, Anru

    2016-01-01

    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

  17. Restricted random search method based on taboo search in the multiple minima problem

    NASA Astrophysics Data System (ADS)

    Hong, Seung Do; Jhon, Mu Shik

    1997-03-01

    The restricted random search method is proposed as a simple Monte Carlo sampling method to search minima fast in the multiple minima problem. This method is based on taboo search applied recently to continuous test functions. The concept of the taboo region instead of the taboo list is used and therefore the sampling of a region near an old configuration is restricted in this method. This method is applied to 2-dimensional test functions and the argon clusters. This method is found to be a practical and efficient method to search near-global configurations of test functions and the argon clusters.

  18. Cognitive-affective depression and somatic symptoms clusters are differentially associated with maternal parenting and coparenting.

    PubMed

    Lamela, Diogo; Jongenelen, Inês; Morais, Ana; Figueiredo, Bárbara

    2017-09-01

    Both depressive and somatic symptoms are significant predictors of parenting and coparenting problems. However, despite clear evidence of their co-occurrence, no study to date has examined the association between depressive-somatic symptoms clusters and parenting and coparenting. The current research sought to identify and cross-validate clusters of cognitive-affective depressive symptoms and nonspecific somatic symptoms, as well as to test whether clusters would differ on parenting and coparenting problems across three independent samples of mothers. Participants in Studies 1 and 3 consisted of 409 and 652 community mothers, respectively. Participants in Study 2 consisted of 162 mothers exposed to intimate partner violence. All participants prospectively completed self-report measures of depressive and nonspecific somatic symptoms and parenting (Studies 1 and 2) or coparenting (Study 3). Across studies, three depression-somatic symptoms clusters were identified: no symptoms, high depression and low nonspecific somatic symptoms, and high depression and nonspecific somatic symptoms. The high depression-somatic symptoms cluster was associated with the highest levels of child physical maltreatment risk (Study 1) and overt-conflict coparenting (Study 3). No differences in perceived maternal competence (Study 2) and cooperative and undermining coparenting (Study 3) were found between the high depression and low somatic symptoms cluster and the high depression-somatic symptoms cluster. The results provide novel evidence for the strong associations between clusters of depression and nonspecific somatic symptoms and specific parenting and coparenting problems. Cluster stability across three independent samples suggest that they may be generalizable. The results inform preventive approaches and evidence-based psychotherapeutic treatments. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Explorations on High Dimensional Landscapes: Spin Glasses and Deep Learning

    NASA Astrophysics Data System (ADS)

    Sagun, Levent

    This thesis deals with understanding the structure of high-dimensional and non-convex energy landscapes. In particular, its focus is on the optimization of two classes of functions: homogeneous polynomials and loss functions that arise in machine learning. In the first part, the notion of complexity of a smooth, real-valued function is studied through its critical points. Existing theoretical results predict that certain random functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This section provides empirical evidence for convergence of gradient descent to local minima whose energies are near the predicted threshold justifying the existing asymptotic theory. Moreover, it is empirically shown that a similar phenomenon may hold for deep learning loss functions. Furthermore, there is a comparative analysis of gradient descent and its stochastic version showing that in high dimensional regimes the latter is a mere speedup. The next study focuses on the halting time of an algorithm at a given stopping condition. Given an algorithm, the normalized fluctuations of the halting time follow a distribution that remains unchanged even when the input data is sampled from a new distribution. Two qualitative classes are observed: a Gumbel-like distribution that appears in Google searches, human decision times, and spin glasses and a Gaussian-like distribution that appears in conjugate gradient method, deep learning with MNIST and random input data. Following the universality phenomenon, the Hessian of the loss functions of deep learning is studied. The spectrum is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. Empirical evidence is presented for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data. Furthermore, an algorithm is proposed such that it would

  20. Dimensional assessment of personality pathology in patients with eating disorders.

    PubMed

    Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L

    1999-02-22

    This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.

  1. Approximating high-dimensional dynamics by barycentric coordinates with linear programming.

    PubMed

    Hirata, Yoshito; Shiro, Masanori; Takahashi, Nozomu; Aihara, Kazuyuki; Suzuki, Hideyuki; Mas, Paloma

    2015-01-01

    The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics of the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.

  2. High-mass stars in Milky Way clusters

    NASA Astrophysics Data System (ADS)

    Negueruela, Ignacio

    2017-11-01

    Young open clusters are our laboratories for studying high-mass star formation and evolution. Unfortunately, the information that they provide is difficult to interpret, and sometimes contradictory. In this contribution, I present a few examples of the uncertainties that we face when confronting observations with theoretical models and our own assumptions.

  3. A hybrid intelligent method for three-dimensional short-term prediction of dissolved oxygen content in aquaculture

    PubMed Central

    Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang

    2018-01-01

    A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies. PMID:29466394

  4. Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2009-01-01

    The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal…

  5. An equivalent domain integral for analysis of two-dimensional mixed mode problems

    NASA Technical Reports Server (NTRS)

    Raju, I. S.; Shivakumar, K. N.

    1989-01-01

    An equivalent domain integral (EDI) method for calculating J-integrals for two-dimensional cracked elastic bodies subjected to mixed mode loading is presented. The total and product integrals consist of the sum of an area or domain integral and line integrals on the crack faces. The EDI method gave accurate values of the J-integrals for two mode I and two mixed mode problems. Numerical studies showed that domains consisting of one layer of elements are sufficient to obtain accurate J-integral values. Two procedures for separating the individual modes from the domain integrals are presented. The procedure that uses the symmetric and antisymmetric components of the stress and displacement fields to calculate the individual modes gave accurate values of the integrals for all the problems analyzed.

  6. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data.

    PubMed

    Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo

    2017-01-01

    Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k -nearest neighbor graphs- ( K -NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

  7. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

    PubMed Central

    Jiang, Zhuqing; Men, Aidong; Yang, Bo

    2017-01-01

    Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity. PMID:29270197

  8. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    NASA Astrophysics Data System (ADS)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  9. HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

    PubMed Central

    Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

    2015-01-01

    In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645

  10. Stochastic clustering of material surface under high-heat plasma load

    NASA Astrophysics Data System (ADS)

    Budaev, Viacheslav P.

    2017-11-01

    The results of a study of a surface formed by high-temperature plasma loads on various materials such as tungsten, carbon and stainless steel are presented. High-temperature plasma irradiation leads to an inhomogeneous stochastic clustering of the surface with self-similar granularity - fractality on the scale from nanoscale to macroscales. Cauliflower-like structure of tungsten and carbon materials are formed under high heat plasma load in fusion devices. The statistical characteristics of hierarchical granularity and scale invariance are estimated. They differ qualitatively from the roughness of the ordinary Brownian surface, which is possibly due to the universal mechanisms of stochastic clustering of material surface under the influence of high-temperature plasma.

  11. High-dimensional quantum cloning and applications to quantum hacking

    PubMed Central

    Bouchard, Frédéric; Fickler, Robert; Boyd, Robert W.; Karimi, Ebrahim

    2017-01-01

    Attempts at cloning a quantum system result in the introduction of imperfections in the state of the copies. This is a consequence of the no-cloning theorem, which is a fundamental law of quantum physics and the backbone of security for quantum communications. Although perfect copies are prohibited, a quantum state may be copied with maximal accuracy via various optimal cloning schemes. Optimal quantum cloning, which lies at the border of the physical limit imposed by the no-signaling theorem and the Heisenberg uncertainty principle, has been experimentally realized for low-dimensional photonic states. However, an increase in the dimensionality of quantum systems is greatly beneficial to quantum computation and communication protocols. Nonetheless, no experimental demonstration of optimal cloning machines has hitherto been shown for high-dimensional quantum systems. We perform optimal cloning of high-dimensional photonic states by means of the symmetrization method. We show the universality of our technique by conducting cloning of numerous arbitrary input states and fully characterize our cloning machine by performing quantum state tomography on cloned photons. In addition, a cloning attack on a Bennett and Brassard (BB84) quantum key distribution protocol is experimentally demonstrated to reveal the robustness of high-dimensional states in quantum cryptography. PMID:28168219

  12. High-dimensional quantum cloning and applications to quantum hacking.

    PubMed

    Bouchard, Frédéric; Fickler, Robert; Boyd, Robert W; Karimi, Ebrahim

    2017-02-01

    Attempts at cloning a quantum system result in the introduction of imperfections in the state of the copies. This is a consequence of the no-cloning theorem, which is a fundamental law of quantum physics and the backbone of security for quantum communications. Although perfect copies are prohibited, a quantum state may be copied with maximal accuracy via various optimal cloning schemes. Optimal quantum cloning, which lies at the border of the physical limit imposed by the no-signaling theorem and the Heisenberg uncertainty principle, has been experimentally realized for low-dimensional photonic states. However, an increase in the dimensionality of quantum systems is greatly beneficial to quantum computation and communication protocols. Nonetheless, no experimental demonstration of optimal cloning machines has hitherto been shown for high-dimensional quantum systems. We perform optimal cloning of high-dimensional photonic states by means of the symmetrization method. We show the universality of our technique by conducting cloning of numerous arbitrary input states and fully characterize our cloning machine by performing quantum state tomography on cloned photons. In addition, a cloning attack on a Bennett and Brassard (BB84) quantum key distribution protocol is experimentally demonstrated to reveal the robustness of high-dimensional states in quantum cryptography.

  13. Extracting Galaxy Cluster Gas Inhomogeneity from X-Ray Surface Brightness: A Statistical Approach and Application to Abell 3667

    NASA Astrophysics Data System (ADS)

    Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi

    2008-11-01

    Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the

  14. Experiences modeling ocean circulation problems on a 30 node commodity cluster with 3840 GPU processor cores.

    NASA Astrophysics Data System (ADS)

    Hill, C.

    2008-12-01

    Low cost graphic cards today use many, relatively simple, compute cores to deliver support for memory bandwidth of more than 100GB/s and theoretical floating point performance of more than 500 GFlop/s. Right now this performance is, however, only accessible to highly parallel algorithm implementations that, (i) can use a hundred or more, 32-bit floating point, concurrently executing cores, (ii) can work with graphics memory that resides on the graphics card side of the graphics bus and (iii) can be partially expressed in a language that can be compiled by a graphics programming tool. In this talk we describe our experiences implementing a complete, but relatively simple, time dependent shallow-water equations simulation targeting a cluster of 30 computers each hosting one graphics card. The implementation takes into account the considerations (i), (ii) and (iii) listed previously. We code our algorithm as a series of numerical kernels. Each kernel is designed to be executed by multiple threads of a single process. Kernels are passed memory blocks to compute over which can be persistent blocks of memory on a graphics card. Each kernel is individually implemented using the NVidia CUDA language but driven from a higher level supervisory code that is almost identical to a standard model driver. The supervisory code controls the overall simulation timestepping, but is written to minimize data transfer between main memory and graphics memory (a massive performance bottle-neck on current systems). Using the recipe outlined we can boost the performance of our cluster by nearly an order of magnitude, relative to the same algorithm executing only on the cluster CPU's. Achieving this performance boost requires that many threads are available to each graphics processor for execution within each numerical kernel and that the simulations working set of data can fit into the graphics card memory. As we describe, this puts interesting upper and lower bounds on the problem sizes

  15. The escape of high explosive products: An exact-solution problem for verification of hydrodynamics codes

    DOE PAGES

    Doebling, Scott William

    2016-10-22

    This paper documents the escape of high explosive (HE) products problem. The problem, first presented by Fickett & Rivard, tests the implementation and numerical behavior of a high explosive detonation and energy release model and its interaction with an associated compressible hydrodynamics simulation code. The problem simulates the detonation of a finite-length, one-dimensional piece of HE that is driven by a piston from one end and adjacent to a void at the other end. The HE equation of state is modeled as a polytropic ideal gas. The HE detonation is assumed to be instantaneous with an infinitesimal reaction zone. Viamore » judicious selection of the material specific heat ratio, the problem has an exact solution with linear characteristics, enabling a straightforward calculation of the physical variables as a function of time and space. Lastly, implementation of the exact solution in the Python code ExactPack is discussed, as are verification cases for the exact solution code.« less

  16. Spectral-element simulation of two-dimensional elastic wave propagation in fully heterogeneous media on a GPU cluster

    NASA Astrophysics Data System (ADS)

    Rudianto, Indra; Sudarmaji

    2018-04-01

    We present an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. We have incorporated most of realistic geological features in the model, including surface topography, curved layer interfaces, and 2-D wave-speed heterogeneity. To accommodate such complexity, we use an unstructured quadrilateral meshing technique. Simulation was performed on a GPU cluster, which consists of 24 core processors Intel Xeon CPU and 4 NVIDIA Quadro graphics cards using CUDA and MPI implementation. We speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.

  17. Approximating high-dimensional dynamics by barycentric coordinates with linear programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hirata, Yoshito, E-mail: yoshito@sat.t.u-tokyo.ac.jp; Aihara, Kazuyuki; Suzuki, Hideyuki

    The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics ofmore » the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.« less

  18. Hierarchical and non-hierarchical {lambda} elements for one dimensional problems with unknown strength of singularity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, K.K.; Surana, K.S.

    1996-10-01

    This paper presents a new and general procedure for designing hierarchical and non-hierarchical special elements called {lambda} elements for one dimensional singular problems where the strength of the singularity is unknown. The {lambda} element formulations presented here permit correct numerical simulation of linear as well as non-linear singular problems without a priori knowledge of the strength of the singularity. A procedure is also presented for determining the exact strength of the singularity using the converged solution. It is shown that in special instances, the general formulation of {lambda} elements can also be made hierarchical. The {lambda} elements presented here aremore » of type C{sup 0} and provide C{sup 0} inter-element continuity with p-version elements. One dimensional steady state radial flow of an upper convected Maxwell fluid is considered as a sample problem. Since in this case {lambda}{sub i} are known, this problem provides a good example for investigating the performance of the formulation proposed here. Least squares approach (or Least Squares Finite Element Formulation: LSFEF) is used to construct the integral form (error functional I) from the differential equations. Numerical studies are presented for radially inward flow of an upper convected Maxwell fluid with inner radius r{sub i} = .1 and .01 etc. and Deborah number De = 2.« less

  19. Statistical Issues in Galaxy Cluster Cosmology

    NASA Technical Reports Server (NTRS)

    Mantz, Adam

    2013-01-01

    The number and growth of massive galaxy clusters are sensitive probes of cosmological structure formation. Surveys at various wavelengths can detect clusters to high redshift, but the fact that cluster mass is not directly observable complicates matters, requiring us to simultaneously constrain scaling relations of observable signals with mass. The problem can be cast as one of regression, in which the data set is truncated, the (cosmology-dependent) underlying population must be modeled, and strong, complex correlations between measurements often exist. Simulations of cosmological structure formation provide a robust prediction for the number of clusters in the Universe as a function of mass and redshift (the mass function), but they cannot reliably predict the observables used to detect clusters in sky surveys (e.g. X-ray luminosity). Consequently, observers must constrain observable-mass scaling relations using additional data, and use the scaling relation model in conjunction with the mass function to predict the number of clusters as a function of redshift and luminosity.

  20. The Effect of Mergers on Galaxy Cluster Mass Estimates

    NASA Astrophysics Data System (ADS)

    Johnson, Ryan E.; Zuhone, John A.; Thorsen, Tessa; Hinds, Andre

    2015-08-01

    At vertices within the filamentary structure that describes the universal matter distribution, clusters of galaxies grow hierarchically through merging with other clusters. As such, the most massive galaxy clusters should have experienced many such mergers in their histories. Though we cannot see them evolve over time, these mergers leave lasting, measurable effects in the cluster galaxies' phase space. By simulating several different galaxy cluster mergers here, we examine how the cluster galaxies kinematics are altered as a result of these mergers. Further, we also examine the effect of our line of sight viewing angle with respect to the merger axis. In projecting the 6-dimensional galaxy phase space onto a 3-dimensional plane, we are able to simulate how these clusters might actually appear to optical redshift surveys. We find that for those optical cluster statistics which are most often used as a proxy for the cluster mass (variants of σv), the uncertainty due to an inprecise or unknown line of sight may alter the derived cluster masses moreso than the kinematic disturbance of the merger itself. Finally, by examining these, and several other clustering statistics, we find that significant events (such as pericentric crossings) are identifiable over a range of merger initial conditions and from many different lines of sight.

  1. The degree-related clustering coefficient and its application to link prediction

    NASA Astrophysics Data System (ADS)

    Liu, Yangyang; Zhao, Chengli; Wang, Xiaojie; Huang, Qiangjuan; Zhang, Xue; Yi, Dongyun

    2016-07-01

    Link prediction plays a significant role in explaining the evolution of networks. However it is still a challenging problem that has been addressed only with topological information in recent years. Based on the belief that network nodes with a great number of common neighbors are more likely to be connected, many similarity indices have achieved considerable accuracy and efficiency. Motivated by the natural assumption that the effect of missing links on the estimation of a node's clustering ability could be related to node degree, in this paper, we propose a degree-related clustering coefficient index to quantify the clustering ability of nodes. Unlike the classical clustering coefficient, our new coefficient is highly robust when the observed bias of links is considered. Furthermore, we propose a degree-related clustering ability path (DCP) index, which applies the proposed coefficient to the link prediction problem. Experiments on 12 real-world networks show that our proposed method is highly accurate and robust compared with four common-neighbor-based similarity indices (Common Neighbors(CN), Adamic-Adar(AA), Resource Allocation(RA), and Preferential Attachment(PA)), and the recently introduced clustering ability (CA) index.

  2. High-performance parallel analysis of coupled problems for aircraft propulsion

    NASA Technical Reports Server (NTRS)

    Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.

    1995-01-01

    This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.

  3. Grid-Enabled High Energy Physics Research using a Beowulf Cluster

    NASA Astrophysics Data System (ADS)

    Mahmood, Akhtar

    2005-04-01

    At Edinboro University of Pennsylvania, we have built a 8-node 25 Gflops Beowulf Cluster with 2.5 TB of disk storage space to carry out grid-enabled, data-intensive high energy physics research for the ATLAS experiment via Grid3. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes. Once fully functional, the Cluster will be part of Grid3[www.ivdgl.org/grid3]. The current ATLAS simulation grid application, models the entire physical processes from the proton anti-proton collisions and detector's response to the collision debri through the complete reconstruction of the event from analyses of these responses. The end result is a detailed set of data that simulates the real physical collision event inside a particle detector. Grid is the new IT infrastructure for the 21^st century science -- a new computing paradigm that is poised to transform the practice of large-scale data-intensive research in science and engineering. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.

  4. Experimental ladder proof of Hardy's nonlocality for high-dimensional quantum systems

    NASA Astrophysics Data System (ADS)

    Chen, Lixiang; Zhang, Wuhong; Wu, Ziwen; Wang, Jikang; Fickler, Robert; Karimi, Ebrahim

    2017-08-01

    Recent years have witnessed a rapidly growing interest in high-dimensional quantum entanglement for fundamental studies as well as towards novel applications. Therefore, the ability to verify entanglement between physical qudits, d -dimensional quantum systems, is of crucial importance. To show nonclassicality, Hardy's paradox represents "the best version of Bell's theorem" without using inequalities. However, so far it has only been tested experimentally for bidimensional vector spaces. Here, we formulate a theoretical framework to demonstrate the ladder proof of Hardy's paradox for arbitrary high-dimensional systems. Furthermore, we experimentally demonstrate the ladder proof by taking advantage of the orbital angular momentum of high-dimensionally entangled photon pairs. We perform the ladder proof of Hardy's paradox for dimensions 3 and 4, both with the ladder up to the third step. Our paper paves the way towards a deeper understanding of the nature of high-dimensionally entangled quantum states and may find applications in quantum information science.

  5. 2DRMP: A suite of two-dimensional R-matrix propagation codes

    NASA Astrophysics Data System (ADS)

    Scott, N. S.; Scott, M. P.; Burke, P. G.; Stitt, T.; Faro-Maza, V.; Denis, C.; Maniopoulou, A.

    2009-12-01

    The R-matrix method has proved to be a remarkably stable, robust and efficient technique for solving the close-coupling equations that arise in electron and photon collisions with atoms, ions and molecules. During the last thirty-four years a series of related R-matrix program packages have been published periodically in CPC. These packages are primarily concerned with low-energy scattering where the incident energy is insufficient to ionise the target. In this paper we describe 2DRMP, a suite of two-dimensional R-matrix propagation programs aimed at creating virtual experiments on high performance and grid architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Program summaryProgram title: 2DRMP Catalogue identifier: AEEA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 196 717 No. of bytes in distributed program, including test data, etc.: 3 819 727 Distribution format: tar.gz Programming language: Fortran 95, MPI Computer: Tested on CRAY XT4 [1]; IBM eServer 575 [2]; Itanium II cluster [3] Operating system: Tested on UNICOS/lc [1]; IBM AIX [2]; Red Hat Linux Enterprise AS [3] Has the code been vectorised or parallelised?: Yes. 16 cores were used for small test run Classification: 2.4 External routines: BLAS, LAPACK, PBLAS, ScaLAPACK Subprograms used: ADAZ_v1_1 Nature of problem: 2DRMP is a suite of programs aimed at creating virtual experiments on high performance architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Solution method: Two-dimensional R-matrix propagation theory. The (r,r) space of the internal region is subdivided into a number of subregions. Local R-matrices are constructed

  6. Implementation of pattern generation algorithm in forming Gilmore and Gomory model for two dimensional cutting stock problem

    NASA Astrophysics Data System (ADS)

    Octarina, Sisca; Radiana, Mutia; Bangun, Putra B. J.

    2018-01-01

    Two dimensional cutting stock problem (CSP) is a problem in determining the cutting pattern from a set of stock with standard length and width to fulfill the demand of items. Cutting patterns were determined in order to minimize the usage of stock. This research implemented pattern generation algorithm to formulate Gilmore and Gomory model of two dimensional CSP. The constraints of Gilmore and Gomory model was performed to assure the strips which cut in the first stage will be used in the second stage. Branch and Cut method was used to obtain the optimal solution. Based on the results, it found many patterns combination, if the optimal cutting patterns which correspond to the first stage were combined with the second stage.

  7. Abundances of Local Group Globular Clusters Using High Resolution Integrated Light Spectroscopy

    NASA Astrophysics Data System (ADS)

    Sakari, Charli; McWilliam, A.; Venn, K.; Shetrone, M. D.; Dotter, A. L.; Mackey, D.

    2014-01-01

    Abundances and kinematics of extragalactic globular clusters provide valuable clues about galaxy and globular cluster formation in a wide variety of environments. In order to obtain such information about distant, unresolved systems, specific observational techniques are required. An Integrated Light Spectrum (ILS) provides a single spectrum from an entire stellar population, and can therefore be used to determine integrated cluster abundances. This dissertation investigates the accuracy of high resolution ILS analysis methods, using ILS (taken with the Hobby-Eberly Telescope) of globular clusters associated with the Milky Way (47 Tuc, M3, M13, NGC 7006, and M15) and then applies the method to globular clusters in the outer halo of M31 (from the Pan-Andromeda Archaeological Survey, or PAndAS). Results show that: a) as expected, the high resolution method reproduces individual stellar abundances for elements that do not vary within a cluster; b) the presence of multiple populations does affect the abundances of elements that vary within the cluster; c) certain abundance ratios are very sensitive to systematic effects, while others are not; and d) certain abundance ratios (e.g. [Ca/Fe]) can be accurately obtained from unresolved systems. Applications of ILABUNDS to the PAndAS clusters reveal that accretion may have played an important role in the formation of M31's outer halo.

  8. InCHlib - interactive cluster heatmap for web applications.

    PubMed

    Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel

    2014-12-01

    Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life

  9. Using three dimensional silicone ``boots`` to solve complex remedial design problems in curtain walls

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Y.J.

    1998-12-31

    Stick system curtain wall leak problems are frequently caused by water entry at the splice joints of the curtain wall frame and failure of the internal metal joinery seals. Remedial solutions involving occupied buildings inevitably face the multiple constraints of existing construction and business operations not present during the original curtain wall construction. In most cases, even partial disassembly of the curtain wall for internal seal repairs is not feasible. Remedial solutions which must be executed from the exterior of the curtain wall often involve wet-applied or preformed sealant tape bridge joints. However, some of the more complex joints cannotmore » be repaired effectively or economically with the conventional bridge joint. Fortunately, custom fabricated three-dimensional preformed sealant boots are becoming available to address these situations. This paper discusses the design considerations and the selective use of three-dimensional preformed boots in sealing complex joint geometry that would not be effective with the conventional two-dimensional bridge joint.« less

  10. High Altitude Medical Problems

    PubMed Central

    Hultgren, Herbert N.

    1979-01-01

    Increased travel to high altitude areas by mountaineers and nonclimbing tourists has emphasized the clinical problems associated with rapid ascent. Acute mountain sickness affects most sojourners at elevations above 10,000 feet. Symptoms are usually worse on the second or third day after arrival. Gradual ascent, spending one to three days at an intermediate altitude, and the use of acetazolamide (Diamox) will prevent or ameliorate symptoms in most instances. Serious and potentially fatal problems, such as high altitude pulmonary edema or cerebral edema, occur in approximately 0.5 percent to 1.0 percent of visitors to elevations above 10,000 feet—especially with heavy physical exertion on arrival, such as climbing or skiing. Early recognition, high flow oxygen therapy and prompt descent are crucially important in management. Our knowledge of the causes of these and other high altitude problems, such as retinal hemorrhage, systemic edema and pulmonary hypertension, is still incomplete. Even less is known of the effect of high altitudes on medical conditions common at sea level or on the action of commonly used drugs. ImagesFigure 2. PMID:483805

  11. On the Measure and the Structure of the Free Boundary of the Lower Dimensional Obstacle Problem

    NASA Astrophysics Data System (ADS)

    Focardi, Matteo; Spadaro, Emanuele

    2018-04-01

    We provide a thorough description of the free boundary for the lower dimensional obstacle problem in R^{n+1} up to sets of null H^{n-1} measure. In particular, we prove (i) local finiteness of the (n-1)-dimensional Hausdorff measure of the free boundary, (ii) H^{n-1}-rectifiability of the free boundary, (iii) classification of the frequencies up to a set of Hausdorff dimension at most (n-2) and classification of the blow-ups at H^{n-1} almost every free boundary point.

  12. Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications

    PubMed Central

    Sotiropoulos, Konstantinos

    2018-01-01

    In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan–Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing. PMID:29662043

  13. [Autism Spectrum Disorder and DSM-5: Spectrum or Cluster?].

    PubMed

    Kienle, Xaver; Freiberger, Verena; Greulich, Heide; Blank, Rainer

    2015-01-01

    Within the new DSM-5, the currently differentiated subgroups of "Autistic Disorder" (299.0), "Asperger's Disorder" (299.80) and "Pervasive Developmental Disorder" (299.80) are replaced by the more general "Autism Spectrum Disorder". With regard to a patient-oriented and expedient advising therapy planning, however, the issue of an empirically reproducible and clinically feasible differentiation into subgroups must still be raised. Based on two Autism-rating-scales (ASDS and FSK), an exploratory two-step cluster analysis was conducted with N=103 children (age: 5-18) seen in our social-pediatric health care centre to examine potentially autistic symptoms. In the two-cluster solution of both rating scales, mainly the problems in social communication grouped the children into a cluster "with communication problems" (51 % and 41 %), and a cluster "without communication problems". Within the three-cluster solution of the ASDS, sensory hypersensitivity, cleaving to routines and social-communicative problems generated an "autistic" subgroup (22%). The children of the second cluster ("communication problems", 35%) were only described by social-communicative problems, and the third group did not show any problems (38%). In the three-cluster solution of the FSK, the "autistic cluster" of the two-cluster solution differentiated in a subgroup with mainly social-communicative problems (cluster 1) and a second subgroup described by restrictive, repetitive behavior. The different cluster solutions will be discussed with a view to the new DSM-5 diagnostic criteria, for following studies a further specification of some of the ASDS and FSK items could be helpful.

  14. Surfactant 1-Hexadecyl-3-methylimidazolium Chloride Can Convert One-Dimensional Viologen Bromoplumbate into Zero-Dimensional.

    PubMed

    Liu, Guangfeng; Liu, Jie; Nie, Lina; Ban, Rui; Armatas, Gerasimos S; Tao, Xutang; Zhang, Qichun

    2017-05-15

    A zero-dimensional N,N'-dibutyl-4,4'-dipyridinium bromoplumbate, [BV] 6 [Pb 9 Br 30 ], with unusual discrete [Pb 9 Br 30 ] 12- anionic clusters was prepared via a facile surfactant-mediated solvothermal process. This bromoplumbate exhibits a narrower optical band gap relative to the congeneric one-dimensional viologen bromoplumbates.

  15. Do High School Students in India Gamble? A Study of Problem Gambling and Its Correlates.

    PubMed

    Jaisoorya, T S; Beena, K V; Beena, M; Ellangovan, K; Thennarassu, K; Bowden-Jones, Henrietta; Benegal, Vivek; George, Sanju

    2017-06-01

    Studies from the West suggest that significant numbers of high school students gamble, despite it being illegal in this age group. To date, there have been no studies on the prevalence of gambling among senior high school and higher secondary school students in India. This study reports point prevalence of gambling and its psychosocial correlates among high school students in the State of Kerala, India. 5043 high school students in the age group 15-19 years, from 73 schools, were selected by cluster random sampling from the district of Ernakulam, Kerala, South India. They completed questionnaires that assessed gambling, substance use, psychological distress, suicidality, and symptoms of Attention Deficit Hyperactivity Disorder (ADHD). Of a total of 4989 completed questionnaires, 1400 (27.9 %) high school students reported to have ever gambled and 353 (7.1 %) were problem gamblers. Of those who had ever gambled, 25.2 % were problem gamblers. Sports betting (betting on cricket and football) was the most popular form of gambling followed by the lottery. Problem gamblers when compared with non-problem gamblers and non-gamblers were significantly more likely to be male, have academic failures, have higher rates of lifetime alcohol and tobacco use, psychological distress, suicidality, history of sexual abuse and higher ADHD symptom scores. Gambling among adolescents in India deserves greater attention, as one in four students who ever gambled was a problem gambler and because of its association with a range of psychosocial variables.

  16. Inverse finite-size scaling for high-dimensional significance analysis

    NASA Astrophysics Data System (ADS)

    Xu, Yingying; Puranen, Santeri; Corander, Jukka; Kabashima, Yoshiyuki

    2018-06-01

    We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 1010 parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.

  17. Portuguese Lexical Clusters and CVC Sequences in Speech Perception and Production.

    PubMed

    Cunha, Conceição

    2015-01-01

    This paper investigates similarities between lexical consonant clusters and CVC sequences differing in the presence or absence of a lexical vowel in speech perception and production in two Portuguese varieties. The frequent high vowel deletion in the European variety (EP) and the realization of intervening vocalic elements between lexical clusters in Brazilian Portuguese (BP) may minimize the contrast between lexical clusters and CVC sequences in the two Portuguese varieties. In order to test this hypothesis we present a perception experiment with 72 participants and a physiological analysis of 3-dimensional movement data from 5 EP and 4 BP speakers. The perceptual results confirmed a gradual confusion of lexical clusters and CVC sequences in EP, which corresponded roughly to the gradient consonantal overlap found in production. © 2015 S. Karger AG, Basel.

  18. Adapted managerial mathematical model to study the functions and interactions between enterprises in high-tech cluster

    NASA Astrophysics Data System (ADS)

    Anguelov, Kiril P.; Kaynakchieva, Vesela G.

    2017-12-01

    The aim of the current study is to research and analyze Adapted managerial mathematical model to study the functions and interactions between enterprises in high-tech cluster, and his approbation in given high-tech cluster; to create high-tech cluster, taking into account the impact of relationships between individual units in the cluster-Leading Enterprises, network of Enterprises subcontractors, economic infrastructure.

  19. The Clustering of High-Redshift (2.9 < z < 5.4) Quasars in SDSS Stripe 82

    NASA Astrophysics Data System (ADS)

    Timlin, John; Ross, Nicolas; Richards, Gordon; Myers, Adam; Bauer, Franz Erik; Lacy, Mark; Schneider, Donald; Wollack, Edward; Zakamska, Nadia

    2018-01-01

    We present the data from the Spitzer IRAC Equatorial Survey (SpIES) along with our first high-redshift (2.9clustering results using these data. SpIES is a mid-infrared survey covering ~100 square degrees of the Sloan Digital Sky Survey (SDSS) Stripe 82 (S82) field. The SpIES field is optimally located to overlap with the optical data from SDSS and to complement the area of the pre-existing Spitzer data from the Spitzer-HETDEX Exploratory Large-area (SHELA) survey, which adds ~20 square degrees of infrared coverage on S82. SpIES probes magnitudes significantly fainter than WISE; depth that is crucial to detect faint, high-redshift quasars. Using the infrared data from SpIES and SHELA, and the deep optical data from SDSS, we employ multi-dimensional empirical selection algorithms to identify high-redshift quasar candidates in this field. We then combine these candidates with spectroscopically confirmed high-redshift quasars and measure the angular correlation function. Using these results, we compute the linear bias to try to constrain quasar feedback models akin to those in Hopkins et al. 2007.

  20. Genetic and environmental influences on dimensional representations of DSM-IV cluster C personality disorders: a population-based multivariate twin study.

    PubMed

    Reichborn-Kjennerud, Ted; Czajkowski, Nikolai; Neale, Michael C; Ørstavik, Ragnhild E; Torgersen, Svenn; Tambs, Kristian; Røysamb, Espen; Harris, Jennifer R; Kendler, Kenneth S

    2007-05-01

    The DSM-IV cluster C Axis II disorders include avoidant (AVPD), dependent (DEPD) and obsessive-compulsive (OCPD) personality disorders. We aimed to estimate the genetic and environmental influences on dimensional representations of these disorders and examine the validity of the cluster C construct by determining to what extent common familial factors influence the individual PDs. PDs were assessed using the Structured Interview for DSM-IV Personality (SIDP-IV) in a sample of 1386 young adult twin pairs from the Norwegian Institute of Public Health Twin Panel (NIPHTP). A single-factor independent pathway multivariate model was applied to the number of endorsed criteria for the three cluster C disorders, using the statistical modeling program Mx. The best-fitting model included genetic and unique environmental factors only, and equated parameters for males and females. Heritability ranged from 27% to 35%. The proportion of genetic variance explained by a common factor was 83, 48 and 15% respectively for AVPD, DEPD and OCPD. Common genetic and environmental factors accounted for 54% and 64% respectively of the variance in AVPD and DEPD but only 11% of the variance in OCPD. Cluster C PDs are moderately heritable. No evidence was found for shared environmental or sex effects. Common genetic and individual environmental factors account for a substantial proportion of the variance in AVPD and DEPD. However, OCPD appears to be largely etiologically distinct from the other two PDs. The results do not support the validity of the DSM-IV cluster C construct in its present form.

  1. Synaptic Bistability Due to Nucleation and Evaporation of Receptor Clusters

    NASA Astrophysics Data System (ADS)

    Burlakov, V. M.; Emptage, N.; Goriely, A.; Bressloff, P. C.

    2012-01-01

    We introduce a bistability mechanism for long-term synaptic plasticity based on switching between two metastable states that contain significantly different numbers of synaptic receptors. One state is characterized by a two-dimensional gas of mobile interacting receptors and is stabilized against clustering by a high nucleation barrier. The other state contains a receptor gas in equilibrium with a large cluster of immobile receptors, which is stabilized by the turnover rate of receptors into and out of the synapse. Transitions between the two states can be initiated by either an increase (potentiation) or a decrease (depotentiation) of the net receptor flux into the synapse. This changes the saturation level of the receptor gas and triggers nucleation or evaporation of receptor clusters.

  2. Probing high-redshift clusters with HST/ACS gravitational weak-lensing and Chandra x-ray observations

    NASA Astrophysics Data System (ADS)

    Jee, Myungkook James

    2006-06-01

    Clusters of galaxies, the largest gravitationally bound objects in the Universe, are useful tracers of cosmic evolution, and particularly detailed studies of still-forming clusters at high-redshifts can considerably enhance our understanding of the structure formation. We use two powerful methods that have become recently available for the study of these distant clusters: spaced- based gravitational weak-lensing and high-resolution X-ray observations. Detailed analyses of five high-redshift (0.8 < z < 1.3) clusters are presented based on the deep Advanced Camera for Surveys (ACS) and Chandra X-ray images. We show that, when the instrumental characteristics are properly understood, the newly installed ACS on the Hubble Space Telescope (HST) can detect subtle shape distortions of background galaxies down to the limiting magnitudes of the observations, which enables the mapping of the cluster dark matter in unprecedented high-resolution. The cluster masses derived from this HST /ACS weak-lensing study have been compared with those from the re-analyses of the archival Chandra X-ray data. We find that there are interesting offsets between the cluster galaxy, intracluster medium (ICM), and dark matter centroids, and possible scenarios are discussed. If the offset is confirmed to be uniquitous in other clusters, the explanation may necessitate major refinements in our current understanding of the nature of dark matter, as well as the cluster galaxy dynamics. CL0848+4452, the highest-redshift ( z = 1.27) cluster yet detected in weak-lensing, has a significant discrepancy between the weak- lensing and X-ray masses. If this trend is found to be severe and common also for other X-ray weak clusters at redshifts beyond the unity, the conventional X-ray determination of cluster mass functions, often inferred from their immediate X-ray properties such as the X-ray luminosity and temperature via the so-called mass-luminosity (M-L) and mass-temperature (M-T) relations, will become

  3. Synchronous parallel spatially resolved stochastic cluster dynamics

    DOE PAGES

    Dunn, Aaron; Dingreville, Rémi; Martínez, Enrique; ...

    2016-04-23

    In this work, a spatially resolved stochastic cluster dynamics (SRSCD) model for radiation damage accumulation in metals is implemented using a synchronous parallel kinetic Monte Carlo algorithm. The parallel algorithm is shown to significantly increase the size of representative volumes achievable in SRSCD simulations of radiation damage accumulation. Additionally, weak scaling performance of the method is tested in two cases: (1) an idealized case of Frenkel pair diffusion and annihilation, and (2) a characteristic example problem including defect cluster formation and growth in α-Fe. For the latter case, weak scaling is tested using both Frenkel pair and displacement cascade damage.more » To improve scaling of simulations with cascade damage, an explicit cascade implantation scheme is developed for cases in which fast-moving defects are created in displacement cascades. For the first time, simulation of radiation damage accumulation in nanopolycrystals can be achieved with a three dimensional rendition of the microstructure, allowing demonstration of the effect of grain size on defect accumulation in Frenkel pair-irradiated α-Fe.« less

  4. Solid State Digital Propulsion "Cluster Thrusters" For Small Satellites Using High Performance Electrically Controlled Extinguishable Solid Propellants (ECESP)

    NASA Technical Reports Server (NTRS)

    Sawka, Wayne N.; Katzakian, Arthur; Grix, Charles

    2005-01-01

    Electrically controlled extinguishable solid propellants (ESCSP) are capable of multiple ignitions, extinguishments and throttle control by the application of electrical power. Both core and end burning no moving parts ECESP grains/motors to three inches in diameter have now been tested. Ongoing research has led to a newer family of even higher performance ECESP providing up to 10% higher performance, manufacturing ease, and significantly higher electrical conduction. The high conductivity was not found to be desirable for larger motors; however it is ideal for downward scaling to micro and pico- propulsion applications with a web thickness of less than 0.125 inch/ diameter. As a solid solution propellant, this ECESP is molecularly uniform, having no granular structure. Because of this homogeneity and workable viscosity it can be directly cast into thin layers or vacuum cast into complex geometries. Both coaxial and grain stacks have been demonstrated. Combining individual propellant coaxial grains and/or grain stacks together form three-dimensional arrays yield modular cluster thrusters. Adoption of fabless manufacturing methods and standards from the electronics industry will provide custom, highly reproducible micro-propulsion arrays and clusters at low costs. These stack and cluster thruster designs provide a small footprint saving spacecraft surface area for solar panels and/or experiments. The simplicity of these thrusters will enable their broad use on micro-pico satellites for primary propulsion, ACS and formation flying applications. Larger spacecraft may find uses for ECESP thrusters on extended booms, on-orbit refueling, pneumatic actuators, and gas generators.

  5. Percolation analyses of observed and simulated galaxy clustering

    NASA Astrophysics Data System (ADS)

    Bhavsar, S. P.; Barrow, J. D.

    1983-11-01

    A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.

  6. Initiation of pharmacotherapy for post-traumatic stress disorder among veterans from Iraq and Afghanistan: a dimensional, symptom cluster approach

    PubMed Central

    Rosenheck, Robert; Mohamed, Somaia; Pietrzak, Robert; Hoff, Rani

    2016-01-01

    Background The pharmacological treatment of post-traumatic stress disorder (PTSD) is extremely challenging, as no specific agent has been developed exclusively to treat this disorder. Thus, there are growing concerns among the public, providers and consumers associated with its use as the efficacy of some agents is still in question. Aims We applied a dimensional and symptom cluster-based approach to better understand how the heterogeneous phenotypic presentation of PTSD may relate to the initiation of pharmacotherapy for PTSD initial episode. Method US veterans who served in the conflicts in Iraq and Afghanistan and received an initial PTSD diagnosis at the US Veterans Health Administration between 2008 and 2011 were included in this study. Veterans were followed for 365 days from initial PTSD diagnosis to identify initiation for antidepressants, anxiolytics/sedatives/hypnotics, antipsychotics and prazosin. Multivariable analyses were used to assess the relationship between the severity of unique PTSD symptom clusters and receiving prescriptions from each medication class, as well as the time from diagnosis to first prescription. Results Increased severity of emotional numbing symptoms was independently associated with the prescription of antidepressants, and they were prescribed after a substantially shorter period of time than other medications. Anxiolytics/sedatives/hypnotics prescription was associated with heightened re-experiencing symptoms and sleep difficulties. Antipsychotics were associated with elevated re-experiencing and numbing symptoms and prazosin with reported nightmares. Conclusions Prescribing practices for military-related PTSD appear to follow US VA/DoD clinical guidelines. Results of this study suggest that a novel dimensional and symptom cluster-based approach to classifying the phenotypic presentation of military-related PTSD symptoms may help inform prescribing patterns for PTSD. Declaration of interest None. Copyright and usage © The

  7. A Conserving Discretization for the Free Boundary in a Two-Dimensional Stefan Problem

    NASA Astrophysics Data System (ADS)

    Segal, Guus; Vuik, Kees; Vermolen, Fred

    1998-03-01

    The dissolution of a disk-likeAl2Cuparticle is considered. A characteristic property is that initially the particle has a nonsmooth boundary. The mathematical model of this dissolution process contains a description of the particle interface, of which the position varies in time. Such a model is called a Stefan problem. It is impossible to obtain an analytical solution for a general two-dimensional Stefan problem, so we use the finite element method to solve this problem numerically. First, we apply a classical moving mesh method. Computations show that after some time steps the predicted particle interface becomes very unrealistic. Therefore, we derive a new method for the displacement of the free boundary based on the balance of atoms. This method leads to good results, also, for nonsmooth boundaries. Some numerical experiments are given for the dissolution of anAl2Cuparticle in anAl-Cualloy.

  8. TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES

    PubMed Central

    Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn

    2017-01-01

    Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene “relationship” matrices that are of practical interest, such as the weighted adjacency matrices. PMID:29081874

  9. TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.

    PubMed

    Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn

    2017-09-01

    Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene "relationship" matrices that are of practical interest, such as the weighted adjacency matrices.

  10. Morphology of size-selected Ptn clusters on CeO2(111)

    NASA Astrophysics Data System (ADS)

    Shahed, Syed Mohammad Fakruddin; Beniya, Atsushi; Hirata, Hirohito; Watanabe, Yoshihide

    2018-03-01

    Supported Pt catalysts and ceria are well known for their application in automotive exhaust catalysts. Size-selected Pt clusters supported on a CeO2(111) surface exhibit distinct physical and chemical properties. We investigated the morphology of the size-selected Ptn (n = 5-13) clusters on a CeO2(111) surface using scanning tunneling microscopy at room temperature. Ptn clusters prefer a two-dimensional morphology for n = 5 and a three-dimensional (3D) morphology for n ≥ 6. We further observed the preference for a 3D tri-layer structure when n ≥ 10. For each cluster size, we quantitatively estimated the relative fraction of the clusters for each type of morphology. Size-dependent morphology of the Ptn clusters on the CeO2(111) surface was attributed to the Pt-Pt interaction in the cluster and the Pt-O interaction between the cluster and CeO2(111) surface. The results obtained herein provide a clear understanding of the size-dependent morphology of the Ptn clusters on a CeO2(111) surface.

  11. Morphology of size-selected Ptn clusters on CeO2(111).

    PubMed

    Shahed, Syed Mohammad Fakruddin; Beniya, Atsushi; Hirata, Hirohito; Watanabe, Yoshihide

    2018-03-21

    Supported Pt catalysts and ceria are well known for their application in automotive exhaust catalysts. Size-selected Pt clusters supported on a CeO 2 (111) surface exhibit distinct physical and chemical properties. We investigated the morphology of the size-selected Pt n (n = 5-13) clusters on a CeO 2 (111) surface using scanning tunneling microscopy at room temperature. Pt n clusters prefer a two-dimensional morphology for n = 5 and a three-dimensional (3D) morphology for n ≥ 6. We further observed the preference for a 3D tri-layer structure when n ≥ 10. For each cluster size, we quantitatively estimated the relative fraction of the clusters for each type of morphology. Size-dependent morphology of the Pt n clusters on the CeO 2 (111) surface was attributed to the Pt-Pt interaction in the cluster and the Pt-O interaction between the cluster and CeO 2 (111) surface. The results obtained herein provide a clear understanding of the size-dependent morphology of the Pt n clusters on a CeO 2 (111) surface.

  12. Stable dissipative optical vortex clusters by inhomogeneous effective diffusion.

    PubMed

    Li, Huishan; Lai, Shiquan; Qui, Yunli; Zhu, Xing; Xie, Jianing; Mihalache, Dumitru; He, Yingji

    2017-10-30

    We numerically show the generation of robust vortex clusters embedded in a two-dimensional beam propagating in a dissipative medium described by the generic cubic-quintic complex Ginzburg-Landau equation with an inhomogeneous effective diffusion term, which is asymmetrical in the two transverse directions and periodically modulated in the longitudinal direction. We show the generation of stable optical vortex clusters for different values of the winding number (topological charge) of the input optical beam. We have found that the number of individual vortex solitons that form the robust vortex cluster is equal to the winding number of the input beam. We have obtained the relationships between the amplitudes and oscillation periods of the inhomogeneous effective diffusion and the cubic gain and diffusion (viscosity) parameters, which depict the regions of existence and stability of vortex clusters. The obtained results offer a method to form robust vortex clusters embedded in two-dimensional optical beams, and we envisage potential applications in the area of structured light.

  13. Intra-cluster Globular Clusters in a Simulated Galaxy Cluster

    NASA Astrophysics Data System (ADS)

    Ramos-Almendares, Felipe; Abadi, Mario; Muriel, Hernán; Coenda, Valeria

    2018-01-01

    Using a cosmological dark matter simulation of a galaxy-cluster halo, we follow the temporal evolution of its globular cluster population. To mimic the red and blue globular cluster populations, we select at high redshift (z∼ 1) two sets of particles from individual galactic halos constrained by the fact that, at redshift z = 0, they have density profiles similar to observed ones. At redshift z = 0, approximately 60% of our selected globular clusters were removed from their original halos building up the intra-cluster globular cluster population, while the remaining 40% are still gravitationally bound to their original galactic halos. As the blue population is more extended than the red one, the intra-cluster globular cluster population is dominated by blue globular clusters, with a relative fraction that grows from 60% at redshift z = 0 up to 83% for redshift z∼ 2. In agreement with observational results for the Virgo galaxy cluster, the blue intra-cluster globular cluster population is more spatially extended than the red one, pointing to a tidally disrupted origin.

  14. High-frequency modes in a two-dimensional rectangular room with windows

    NASA Astrophysics Data System (ADS)

    Shabalina, E. D.; Shirgina, N. V.; Shanin, A. V.

    2010-07-01

    We examine a two-dimensional model problem of architectural acoustics on sound propagation in a rectangular room with windows. It is supposed that the walls are ideally flat and hard; the windows absorb all energy that falls upon them. We search for the modes of such a room having minimal attenuation indices, which have the expressed structure of billiard trajectories. The main attenuation mechanism for such modes is diffraction at the edges of the windows. We construct estimates for the attenuation indices of the given modes based on the solution to the Weinstein problem. We formulate diffraction problems similar to the statement of the Weinstein problem that describe the attenuation of billiard modes in complex situations.

  15. Self-assembled three-dimensional chiral colloidal architecture.

    PubMed

    Ben Zion, Matan Yah; He, Xiaojin; Maass, Corinna C; Sha, Ruojie; Seeman, Nadrian C; Chaikin, Paul M

    2017-11-03

    Although stereochemistry has been a central focus of the molecular sciences since Pasteur, its province has previously been restricted to the nanometric scale. We have programmed the self-assembly of micron-sized colloidal clusters with structural information stemming from a nanometric arrangement. This was done by combining DNA nanotechnology with colloidal science. Using the functional flexibility of DNA origami in conjunction with the structural rigidity of colloidal particles, we demonstrate the parallel self-assembly of three-dimensional microconstructs, evincing highly specific geometry that includes control over position, dihedral angles, and cluster chirality. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  16. Three-body problem in d-dimensional space: Ground state, (quasi)-exact-solvability

    NASA Astrophysics Data System (ADS)

    Turbiner, Alexander V.; Miller, Willard; Escobar-Ruiz, M. A.

    2018-02-01

    As a straightforward generalization and extension of our previous paper [A. V. Turbiner et al., "Three-body problem in 3D space: Ground state, (quasi)-exact-solvability," J. Phys. A: Math. Theor. 50, 215201 (2017)], we study the aspects of the quantum and classical dynamics of a 3-body system with equal masses, each body with d degrees of freedom, with interaction depending only on mutual (relative) distances. The study is restricted to solutions in the space of relative motion which are functions of mutual (relative) distances only. It is shown that the ground state (and some other states) in the quantum case and the planar trajectories (which are in the interaction plane) in the classical case are of this type. The quantum (and classical) Hamiltonian for which these states are eigenfunctions is derived. It corresponds to a three-dimensional quantum particle moving in a curved space with special d-dimension-independent metric in a certain d-dependent singular potential, while at d = 1, it elegantly degenerates to a two-dimensional particle moving in flat space. It admits a description in terms of pure geometrical characteristics of the interaction triangle which is defined by the three relative distances. The kinetic energy of the system is d-independent; it has a hidden sl(4, R) Lie (Poisson) algebra structure, alternatively, the hidden algebra h(3) typical for the H3 Calogero model as in the d = 3 case. We find an exactly solvable three-body S3-permutationally invariant, generalized harmonic oscillator-type potential as well as a quasi-exactly solvable three-body sextic polynomial type potential with singular terms. For both models, an extra first order integral exists. For d = 1, the whole family of 3-body (two-dimensional) Calogero-Moser-Sutherland systems as well as the Tremblay-Turbiner-Winternitz model is reproduced. It is shown that a straightforward generalization of the 3-body (rational) Calogero model to d > 1 leads to two primitive quasi

  17. Clusters of Galaxies at High Redshift

    NASA Astrophysics Data System (ADS)

    Fort, Bernard

    For a long time, the small number of clusters at z > 0.3 in the Abell survey catalogue and simulations of the standard CDM formation of large scale structures provided a paradigm where clusters were considered as young merging structures. At earlier times, loose concentrations of galaxy clumps were mostly anticipated. Recent observations broke the taboo. Progressively we became convinced that compact and massive clusters at z = 1 or possibly beyond exist and should be searched for.

  18. Robust and Efficient Biomolecular Clustering of Tumor Based on ${p}$ -Norm Singular Value Decomposition.

    PubMed

    Kong, Xiang-Zhen; Liu, Jin-Xing; Zheng, Chun-Hou; Hou, Mi-Xiao; Wang, Juan

    2017-07-01

    High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.

  19. On Multi-Dimensional Unstructured Mesh Adaption

    NASA Technical Reports Server (NTRS)

    Wood, William A.; Kleb, William L.

    1999-01-01

    Anisotropic unstructured mesh adaption is developed for a truly multi-dimensional upwind fluctuation splitting scheme, as applied to scalar advection-diffusion. The adaption is performed locally using edge swapping, point insertion/deletion, and nodal displacements. Comparisons are made versus the current state of the art for aggressive anisotropic unstructured adaption, which is based on a posteriori error estimates. Demonstration of both schemes to model problems, with features representative of compressible gas dynamics, show the present method to be superior to the a posteriori adaption for linear advection. The performance of the two methods is more similar when applied to nonlinear advection, with a difference in the treatment of shocks. The a posteriori adaption can excessively cluster points to a shock, while the present multi-dimensional scheme tends to merely align with a shock, using fewer nodes. As a consequence of this alignment tendency, an implementation of eigenvalue limiting for the suppression of expansion shocks is developed for the multi-dimensional distribution scheme. The differences in the treatment of shocks by the adaption schemes, along with the inherently low levels of artificial dissipation in the fluctuation splitting solver, suggest the present method is a strong candidate for applications to compressible gas dynamics.

  20. Analysis of chaos in high-dimensional wind power system.

    PubMed

    Wang, Cong; Zhang, Hongli; Fan, Wenhui; Ma, Ping

    2018-01-01

    A comprehensive analysis on the chaos of a high-dimensional wind power system is performed in this study. A high-dimensional wind power system is more complex than most power systems. An 11-dimensional wind power system proposed by Huang, which has not been analyzed in previous studies, is investigated. When the systems are affected by external disturbances including single parameter and periodic disturbance, or its parameters changed, chaotic dynamics of the wind power system is analyzed and chaotic parameters ranges are obtained. Chaos existence is confirmed by calculation and analysis of all state variables' Lyapunov exponents and the state variable sequence diagram. Theoretical analysis and numerical simulations show that the wind power system chaos will occur when parameter variations and external disturbances change to a certain degree.

  1. High-resolution observations of the globular cluster NGC 7099

    NASA Astrophysics Data System (ADS)

    Sams, Bruce Jones, III

    The globular cluster NGC 7099 is a prototypical collapsed core cluster. Through a series of instrumental, observational, and theoretical observations, I have resolved its core structure using a ground based telescope. The core has a radius of 2.15 arcsec when imaged with a V band spatial resolution of 0.35 arcsec. Initial attempts at speckle imaging produced images of inadequate signal to noise and resolution. To explain these results, a new, fully general signal-to-noise model has been developed. It properly accounts for all sources of noise in a speckle observation, including aliasing of high spatial frequencies by inadequate sampling of the image plane. The model, called Full Speckle Noise (FSN), can be used to predict the outcome of any speckle imaging experiment. A new high resolution imaging technique called ACT (Atmospheric Correlation with a Template) was developed to create sharper astronomical images. ACT compensates for image motion due to atmospheric turbulence. ACT is similar to the Shift and Add algorithm, but uses apriori spatial knowledge about the image to further constrain the shifts. In this instance, the final images of NGC 7099 have resolutions of 0.35 arcsec from data taken in 1 arcsec seeing. The PAPA (Precision Analog Photon Address) camera was used to record data. It is subject to errors when imaging cluster cores in a large field of view. The origin of these errors is explained, and several ways to avoid them proposed. New software was created for the PAPA camera to properly take flat field images taken in a large field of view. Absolute photometry measurements of NGC 7099 made with the PAPA camera are accurate to 0.1 magnitude. Luminosity sampling errors dominate surface brightness profiles of the central few arcsec in a collapsed core cluster. These errors set limits on the ultimate spatial accuracy of surface brightness profiles. high resolution; even to a perfectly functioning Hubble

  2. Optimization of High-Dimensional Functions through Hypercube Evaluation

    PubMed Central

    Abiyev, Rahib H.; Tunay, Mustafa

    2015-01-01

    A novel learning algorithm for solving global numerical optimization problems is proposed. The proposed learning algorithm is intense stochastic search method which is based on evaluation and optimization of a hypercube and is called the hypercube optimization (HO) algorithm. The HO algorithm comprises the initialization and evaluation process, displacement-shrink process, and searching space process. The initialization and evaluation process initializes initial solution and evaluates the solutions in given hypercube. The displacement-shrink process determines displacement and evaluates objective functions using new points, and the search area process determines next hypercube using certain rules and evaluates the new solutions. The algorithms for these processes have been designed and presented in the paper. The designed HO algorithm is tested on specific benchmark functions. The simulations of HO algorithm have been performed for optimization of functions of 1000-, 5000-, or even 10000 dimensions. The comparative simulation results with other approaches demonstrate that the proposed algorithm is a potential candidate for optimization of both low and high dimensional functions. PMID:26339237

  3. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  4. Numerical viscosity and resolution of high-order weighted essentially nonoscillatory schemes for compressible flows with high Reynolds numbers.

    PubMed

    Zhang, Yong-Tao; Shi, Jing; Shu, Chi-Wang; Zhou, Ye

    2003-10-01

    A quantitative study is carried out in this paper to investigate the size of numerical viscosities and the resolution power of high-order weighted essentially nonoscillatory (WENO) schemes for solving one- and two-dimensional Navier-Stokes equations for compressible gas dynamics with high Reynolds numbers. A one-dimensional shock tube problem, a one-dimensional example with parameters motivated by supernova and laser experiments, and a two-dimensional Rayleigh-Taylor instability problem are used as numerical test problems. For the two-dimensional Rayleigh-Taylor instability problem, or similar problems with small-scale structures, the details of the small structures are determined by the physical viscosity (therefore, the Reynolds number) in the Navier-Stokes equations. Thus, to obtain faithful resolution to these small-scale structures, the numerical viscosity inherent in the scheme must be small enough so that the physical viscosity dominates. A careful mesh refinement study is performed to capture the threshold mesh for full resolution, for specific Reynolds numbers, when WENO schemes of different orders of accuracy are used. It is demonstrated that high-order WENO schemes are more CPU time efficient to reach the same resolution, both for the one-dimensional and two-dimensional test problems.

  5. Formation of Clustered DNA Damage after High-LET Irradiation: A Review

    NASA Technical Reports Server (NTRS)

    Hada, Megumi; Georgakilas, Alexandros G.

    2008-01-01

    Radiation can cause as well as cure cancer. The risk of developing radiation-induced cancer has traditionally been estimated from cancer incidence among survivors of the atomic bombs in Hiroshima and Nagasaki. These data provide the best estimate of human cancer risk over the dose range for low linear energy transfer (LET) radiations, such as X- or gamma-rays. The situation of estimating the real biological effects becomes even more difficult in the case of high LET particles encountered in space or as the result of domestic exposure to particles from radon gas emitters or other radioactive emitters like uranium-238. Complex DNA damage, i.e., the signature of high-LET radiations comprises by closely spaced DNA lesions forming a cluster of DNA damage. The two basic groups of complex DNA damage are double strand breaks (DSBs) and non-DSB oxidative clustered DNA lesions (OCDL). Theoretical analysis and experimental evidence suggest there is increased complexity and severity of complex DNA damage with increasing LET (linear energy transfer) and a high mutagenic or carcinogenic potential. Data available on the formation of clustered DNA damage (DSBs and OCDL) by high-LET radiations are often controversial suggesting a variable response to dose and type of radiation. The chemical nature and cellular repair mechanisms of complex DNA damage have been much less characterized than those of isolated DNA lesions like an oxidized base or a single strand break especially in the case of high-LET radiation. This review will focus on the induction of clustered DNA damage by high-LET radiations presenting the earlier and recent relative data.

  6. Highly dynamically evolved intermediate-age open clusters

    NASA Astrophysics Data System (ADS)

    Piatti, Andrés E.; Dias, Wilton S.; Sampedro, Laura M.

    2017-04-01

    We present a comprehensive UBVRI and Washington CT1T2 photometric analysis of seven catalogued open clusters, namely: Ruprecht 3, 9, 37, 74, 150, ESO 324-15 and 436-2. The multiband photometric data sets in combination with 2MASS photometry and Gaia astrometry for the brighter stars were used to estimate their structural parameters and fundamental astrophysical properties. We found that Ruprecht 3 and ESO 436-2 do not show self-consistent evidence of being physical systems. The remained studied objects are open clusters of intermediate age (9.0 ≤ log(t yr-1) ≤ 9.6), of relatively small size (rcls ˜ 0.4-1.3 pc) and placed between 0.6 and 2.9 kpc from the Sun. We analysed the relationships between core, half-mass, tidal and Jacoby radii as well as half-mass relaxation times to conclude that the studied clusters are in an evolved dynamical stage. The total cluster masses obtained by summing those of the observed cluster stars resulted to be ˜10-15 per cent of the masses of open clusters of similar age located closer than 2 kpc from the Sun. We found that cluster stars occupy volumes as large as those for tidally filled clusters.

  7. Clustering on Magnesium Surfaces - Formation and Diffusion Energies.

    PubMed

    Chu, Haijian; Huang, Hanchen; Wang, Jian

    2017-07-12

    The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and [Formula: see text]. In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a [Formula: see text], clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface [Formula: see text] is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich-Schwoebel barriers converge as the step height is three atomic layers or thicker. Adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.

  8. Clustered DNA damages induced by high and low LET radiation, including heavy ions

    NASA Technical Reports Server (NTRS)

    Sutherland, B. M.; Bennett, P. V.; Schenk, H.; Sidorkina, O.; Laval, J.; Trunk, J.; Monteleone, D.; Sutherland, J.; Lowenstein, D. I. (Principal Investigator)

    2001-01-01

    Clustered DNA damages--here defined as two or more lesions (strand breaks, oxidized purines, oxidized pyrimidines or abasic sites) within a few helical turns--have been postulated as difficult to repair accurately, and thus highly significant biological lesions. Further, attempted repair of clusters may produce double strand breaks (DSBs). However, until recently, there was no way to measure ionizing radiation-induced clustered damages, except DSB. We recently described an approach for measuring classes of clustered damages (oxidized purine clusters, oxidized pyrimidine clusters, abasic clusters, along with DSB). We showed that ionizing radiation (gamma rays and Fe ions, 1 GeV/amu) does induce such clusters in genomic DNA in solution and in human cells. These studies also showed that each damage cluster results from one radiation hit (and its track), thus indicating that they can be induced by very low doses of radiation, i.e. two independent hits are not required for cluster induction. Further, among all complex damages, double strand breaks comprise--at most-- 20%, with the other clustered damages being at least 80%.

  9. Cluster Analysis of Junior High School Students' Cognitive Structures

    ERIC Educational Resources Information Center

    Dan, Youngjun; Geng, Leisha; Li, Meng

    2017-01-01

    This study aimed to explore students' cognitive patterns based on their knowledge and levels. Participants were seventh graders from a junior high school in China. Three relatively distinct groups were specified by Cluster Analysis: high knowledge and low ability, low knowledge and low ability, and high knowledge and high ability. The group of low…

  10. Parsimonious description for predicting high-dimensional dynamics

    PubMed Central

    Hirata, Yoshito; Takeuchi, Tomoya; Horai, Shunsuke; Suzuki, Hideyuki; Aihara, Kazuyuki

    2015-01-01

    When we observe a system, we often cannot observe all its variables and may have some of its limited measurements. Under such a circumstance, delay coordinates, vectors made of successive measurements, are useful to reconstruct the states of the whole system. Although the method of delay coordinates is theoretically supported for high-dimensional dynamical systems, practically there is a limitation because the calculation for higher-dimensional delay coordinates becomes more expensive. Here, we propose a parsimonious description of virtually infinite-dimensional delay coordinates by evaluating their distances with exponentially decaying weights. This description enables us to predict the future values of the measurements faster because we can reuse the calculated distances, and more accurately because the description naturally reduces the bias of the classical delay coordinates toward the stable directions. We demonstrate the proposed method with toy models of the atmosphere and real datasets related to renewable energy. PMID:26510518

  11. OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing

    NASA Astrophysics Data System (ADS)

    Wei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping

    2017-02-01

    The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.

  12. Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi.

    PubMed

    Slot, Jason C; Rokas, Antonis

    2011-01-25

    Genes involved in intermediary and secondary metabolism in fungi are frequently physically linked or clustered. For example, in Aspergillus nidulans the entire pathway for the production of sterigmatocystin (ST), a highly toxic secondary metabolite and a precursor to the aflatoxins (AF), is located in a ∼54 kb, 23 gene cluster. We discovered that a complete ST gene cluster in Podospora anserina was horizontally transferred from Aspergillus. Phylogenetic analysis shows that most Podospora cluster genes are adjacent to or nested within Aspergillus cluster genes, although the two genera belong to different taxonomic classes. Furthermore, the Podospora cluster is highly conserved in content, sequence, and microsynteny with the Aspergillus ST/AF clusters and its intergenic regions contain 14 putative binding sites for AflR, the transcription factor required for activation of the ST/AF biosynthetic genes. Examination of ∼52,000 Podospora expressed sequence tags identified transcripts for 14 genes in the cluster, with several expressed at multiple life cycle stages. The presence of putative AflR-binding sites and the expression evidence for several cluster genes, coupled with the recent independent discovery of ST production in Podospora [1], suggest that this HGT event probably resulted in a functional cluster. Given the abundance of metabolic gene clusters in fungi, our finding that one of the largest known metabolic gene clusters moved intact between species suggests that such transfers might have significantly contributed to fungal metabolic diversity. PAPERFLICK: Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

    NASA Astrophysics Data System (ADS)

    Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

    A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.

  14. Alternative Parameterizations for Cluster Editing

    NASA Astrophysics Data System (ADS)

    Komusiewicz, Christian; Uhlmann, Johannes

    Given an undirected graph G and a nonnegative integer k, the NP-hard Cluster Editing problem asks whether G can be transformed into a disjoint union of cliques by applying at most k edge modifications. In the field of parameterized algorithmics, Cluster Editing has almost exclusively been studied parameterized by the solution size k. Contrastingly, in many real-world instances it can be observed that the parameter k is not really small. This observation motivates our investigation of parameterizations of Cluster Editing different from the solution size k. Our results are as follows. Cluster Editing is fixed-parameter tractable with respect to the parameter "size of a minimum cluster vertex deletion set of G", a typically much smaller parameter than k. Cluster Editing remains NP-hard on graphs with maximum degree six. A restricted but practically relevant version of Cluster Editing is fixed-parameter tractable with respect to the combined parameter "number of clusters in the target graph" and "maximum number of modified edges incident to any vertex in G". Many of our results also transfer to the NP-hard Cluster Deletion problem, where only edge deletions are allowed.

  15. Application of fuzzy c-means clustering to PRTR chemicals uncovering their release and toxicity characteristics.

    PubMed

    Xue, Mianqiang; Zhou, Liang; Kojima, Naoya; Dos Muchangos, Leticia Sarmento; Machimura, Takashi; Tokai, Akihiro

    2018-05-01

    Increasing manufacture and usage of chemicals have not been matched by the increase in our understanding of their risks. Pollutant release and transfer register (PRTR) is becoming a popular measure for collecting chemical data and enhancing the public right to know. However, these data are usually in high dimensionality which restricts their wider use. The present study partitions Japanese PRTR chemicals into five fuzzy clusters by fuzzy c-mean clustering (FCM) to explore the implicit information. Each chemical with membership degrees belongs to each cluster. Cluster I features high releases from non-listed industries and the household sector and high environmental toxicity. Cluster II is characterized by high reported releases and transfers from 24 listed industries above the threshold, mutagenicity, and high environmental toxicity. Chemicals in cluster III have characteristics of high releases from non-listed industries and low toxicity. Cluster IV is characterized by high reported releases and transfers from 24 listed industries above the threshold and extremely high environmental toxicity. Cluster V is characterized by low releases yet mutagenicity and high carcinogenicity. Chemicals with the highest membership degree were identified as representatives for each cluster. For the highest membership degree, half of the chemicals have a value higher than 0.74. If we look at both the highest and the second highest membership degrees simultaneously, about 94% of the chemicals have a value higher than 0.5. FCM can serve as an approach to uncover the implicit information of highly complex chemical dataset, which subsequently supports the strategy development for efficient and effective chemical management. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Modes of self-organization of diluted bubbly liquids in acoustic fields: One-dimensional theory.

    PubMed

    Gumerov, Nail A; Akhatov, Iskander S

    2017-02-01

    The paper is dedicated to mathematical modeling of self-organization of bubbly liquids in acoustic fields. A continuum model describing the two-way interaction of diluted polydisperse bubbly liquids and acoustic fields in weakly-nonlinear approximation is studied analytically and numerically in the one-dimensional case. It is shown that the regimes of self-organization of monodisperse bubbly liquids can be controlled by only a few dimensionless parameters. Two basic modes, clustering and propagating shock waves of void fraction (acoustically induced transparency), are identified and criteria for their realization in the space of parameters are proposed. A numerical method for solving of one-dimensional self-organization problems is developed. Computational results for mono- and polydisperse systems are discussed.

  17. A 2-dimensional optical architecture for solving Hamiltonian path problem based on micro ring resonators

    NASA Astrophysics Data System (ADS)

    Shakeri, Nadim; Jalili, Saeed; Ahmadi, Vahid; Rasoulzadeh Zali, Aref; Goliaei, Sama

    2015-01-01

    The problem of finding the Hamiltonian path in a graph, or deciding whether a graph has a Hamiltonian path or not, is an NP-complete problem. No exact solution has been found yet, to solve this problem using polynomial amount of time and space. In this paper, we propose a two dimensional (2-D) optical architecture based on optical electronic devices such as micro ring resonators, optical circulators and MEMS based mirror (MEMS-M) to solve the Hamiltonian Path Problem, for undirected graphs in linear time. It uses a heuristic algorithm and employs n+1 different wavelengths of a light ray, to check whether a Hamiltonian path exists or not on a graph with n vertices. Then if a Hamiltonian path exists, it reports the path. The device complexity of the proposed architecture is O(n2).

  18. Electron scattering in large water clusters from photoelectron imaging with high harmonic radiation.

    PubMed

    Gartmann, Thomas E; Hartweg, Sebastian; Ban, Loren; Chasovskikh, Egor; Yoder, Bruce L; Signorell, Ruth

    2018-06-06

    Low-energy electron scattering in water clusters (H2O)n with average cluster sizes of n < 700 is investigated by angle-resolved photoelectron spectroscopy using high harmonic radiation at photon energies of 14.0, 20.3, and 26.5 eV for ionization from the three outermost valence orbitals. The measurements probe the evolution of the photoelectron anisotropy parameter β as a function of cluster size. A remarkably steep decrease of β with increasing cluster size is observed, which for the largest clusters reaches liquid bulk values. Detailed electron scattering calculations reveal that neither gas nor condensed phase scattering can explain the cluster data. Qualitative agreement between experiment and simulations is obtained with scattering calculations that treat cluster scattering as an intermediate case between gas and condensed phase scattering.

  19. Automatic segmentation of brain MRI in high-dimensional local and non-local feature space based on sparse representation.

    PubMed

    Khalilzadeh, Mohammad Mahdi; Fatemizadeh, Emad; Behnam, Hamid

    2013-06-01

    Automatic extraction of the varying regions of magnetic resonance images is required as a prior step in a diagnostic intelligent system. The sparsest representation and high-dimensional feature are provided based on learned dictionary. The classification is done by employing the technique that computes the reconstruction error locally and non-locally of each pixel. The acquired results from the real and simulated images are superior to the best MRI segmentation method with regard to the stability advantages. In addition, it is segmented exactly through a formula taken from the distance and sparse factors. Also, it is done automatically taking sparse factor in unsupervised clustering methods whose results have been improved. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. A BDDC Algorithm with Deluxe Scaling for Three-Dimensional H (curl) Problems

    DOE PAGES

    Dohrmann, Clark R.; Widlund, Olof B.

    2015-04-28

    In our paper, we present and analyze a BDDC algorithm for a class of elliptic problems in the three-dimensional H(curl) space. Compared with existing results, our condition number estimate requires fewer assumptions and also involves two fewer powers of log(H/h), making it consistent with optimal estimates for other elliptic problems. Here, H/his the maximum of Hi/hi over all subdomains, where Hi and hi are the diameter and the smallest element diameter for the subdomain Ωi. The analysis makes use of two recent developments. The first is our new approach to averaging across the subdomain interfaces, while the second is amore » new technical tool that allows arguments involving trace classes to be avoided. Furthermore, numerical examples are presented to confirm the theory and demonstrate the importance of the new averaging approach in certain cases.« less